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Composite  distributions  based  on  specified  marginal  distributions  and  a  speci¬ 
fied  Pearson  product- moment  correlation  structure  are  formed  by  mixing  extreme- 
correlation  distributions  of  a  multivariate  random  variable  and  the  joint  distribu¬ 
tion  rmder  independence.  Closed-form  expressions  are  provided  for  the  composi¬ 
tion  probabilities  for  composite  distributions  for  trivariate  random  variables,  and  a 
simple  algorithm  for  finding  composition  probabilities  in  the  case  of  quadravariate 
random  variables  is  presented.  A  hnear  program  provides  a  general  approach  for 
finding  composition  probabilities.  For  aU  but  the  extreme  correlation  structmres  a 
range  of  composite  distributions  is  provided.  Composite  distributions  are  used  to 
generate  coefficients  for  1120  two-dimensional  knapsack  problems  based  on  a  vari¬ 
ety  of  Pearson  correlation  structures.  An  equal  number  of  problems  is  generated 
based  on  Spearman  rank  correlation  structures.  The  computational  results  with  a 
branch-and-bound  procedure  and  a  weU-known  heuristic  indicate  that  the  type  of 
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correlation  structure  induced  (Pearson  or  Spearman)  can  affect  the  performance  of 
solution  procedures.  The  correlation  structure  specified  matters,  as  do  the  values 
specified  for  each  correlation  term.  There  is  a  noticeable  interaction  between  the 
correlation  structure  induced  and  the  constraint  slackness  settings.  Finally,  the 
inter  constraint  correlation  is  found  to  affect  solution  procedure  performance  more 
than  either  of  the  objective-constraint  correlations. 
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CHAPTER  I 
INTRODUCTION 


It  is  easy  to  envision  a  customer  who  requires  a  long  service  time  at  one  service 
station  in  a  tandem  queueing  system  also  requiring  long  service  times  at  subsequent 
stations.  Similar  arguments  apply  to  products  requiring  long  processing  times  at 
one  station  in  a  serial  production  line.  In  thinking  about  optimization  problems, 
one  can  imagine  a  direct,  though  imperfect,  relationship  between  the  cost  of  a 
product  and  the  number  of  resomce  imits  that  are  needed  to  produce  that  product. 

Users  of  simulation  techniques  need  the  ability  to  generate  values  of  multi¬ 
variate  random  variables  with  specified  dependencies,  or  population  correlation 
structure,  to  accurately  simulate  real  phenomena.  This  is  true  for  simulations 
of  serial  manufacturing  systems,  as  well  as  empirical  evaluations  of  optimization 
solution  methods. 

Synthetic  optimization  problems  are  (most  often)  randomly  generated  opti¬ 
mization  problems  that  provide  test  cases  for  empirical  evaluations  of  optimization 
solution  methods.  These  studies  are  of  greatest  value  when  the  synthetic  problems 
have  characteristics  similar  to  those  of  problems  encomrtered  in  practice  or  a  vari¬ 
ety  of  characteristics  so  that  the  range  of  a  solution  procedure’s  performance  may 
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be  determined.  In  many  studies  of  optimization  methods,  researchers  assume  that 
all  coefficients  are  mutually  independent,  but  as  with  other  practical  simulations, 
this  independence  assumption  may  not  reflect  real-life  dependencies. 

The  results  of  some  empirical  studies  on  the  performance  of  algorithms  and 
heuristics  indicate  that  the  correlation  between  objective  function  and  constraint 
coefficients  influences  the  performance  of  solution  methods.  Multivariate  sampling 
would  facilitate  the  generation  of  values  of  multivariate  random  variables  with 
realistic  or  prescribed  population  correlation  structures  to  represent  the  coefficients 
in  the  objective  function  and  more  than  one  constraint.  Such  sampling  would 
lead  to  a  deeper  understanding  of  solution  procedure  performance  when  there  are 
dependencies  among  the  problem  coefficients. 

The  goals  of  this  research  are:  (1)  to  develop  a  methodology  for  generating 
values  of  mffitivariate  random  variables  with  specified  marginal  distributions  and 
a  specified  population  correlation  structure,  (2)  to  demonstrate  the  use  of  this 
methodology  in  generating  synthetic  optimization  problems,  and  (3)  to  conduct 
an  empirical  study  to  assess  the  influence  of  the  population  correlation  structure 
on  the  performance  of  optimization  solution  methods. 


1.1  Dissertation  Format 

This  dissertation  contains  two  self-contained  papers.  The  first  paper,  provided  here 
as  Chapter  2,  presents  a  methodology  for  constructing  composite  distributions  for 
multivariate  random  variables  with  specified  marginal  distributions  and  a  specified 
correlation  structure.  The  second  paper,  presented  here  as  Chapter  3,  presents  the 
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results  of  an  empirical  study  of  the  influence  of  poprdation  correlation  structure 
of  the  coefficients  in  synthetic  two-dimensional  knapsack  problems  on  solution 
procedrue  performance. 


1.1.1  Overview  of  Chapter  2 

Extreme-correlation  distributions  are  joint  distributions  in  which  all  pairwise  pop¬ 
ulation  correlations  have  either  their  most  positive  or  their  most  negative  possible 
values.  These  distributions  are  the  building  blocks  for  a  class  of  multivariate 
composite  distributions.  Composite  distributions  constructed  from  the  extreme- 
correlation  distributions  and  the  joint  distribution  imder  independence  form  an 
even  richer  class  of  distributions.  Both  classes  of  composite  distributions  apply 
to  multivariate  discrete  and  continuous  random  variables.  They  facilitate  more 
realistic  simulations  of  practical  systems,  such  as  manufacturing  and  other  tandem 
queueing  systems,  as  well  as  more  comprehensive  computational  experiments  on 
optimization  methods. 


1.1.2  Overview  of  Chapter  3 

Chapter  3  presents  an  empirical  study  of  the  effects  on  solution  methods  of  the 
population  correlation  structure  among  the  coefficient  types  in  the  two-dimensional 
knapsack  problem  (2KP).  The  composite  distributions  presented  in  Chapter  2 
provide  the  requisite  foundation  for  multivariate  sampling  for  generating  values 
of  coefficients  with  specified  Pearson  product- moment  correlation  structures  for 
synthetic  2KPs.  Additional  instances  of  2KPs  with  specified  Spearman  rank  cor- 
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relation  structures  are  also  generated  using  a  known  sampling  technique.  A  total 
of  2240  2KP  instances  are  generated  based  on  two  correlation  measures,  various 
population  correlation  structures  (matrices),  and  four  different  constraint  slackness 
settings.  All  of  the  problems  are  solved  with  a  commercially  available  branch- and- 
bound  code  and  a  well-known  heuristic. 


1.2  Contributions  of  the  Research 


This  research  makes  two  principal  contributions.  The  first  is  the  characteriza¬ 
tion  of  composite  distributions  for  multivariate  random  variables.  Straightforward 
procedmes  for  generating  values  of  multivariate  random  variables  with  a  speci¬ 
fied  population  correlation  structure  make  these  characterizations  an  easy  way  to 
induce  realistic  dependencies  in  simulated  data.  The  second  contribution  is  an 
increased  understanding  of  the  effect  on  solution  procedure  performance  of  the 
correlation  structure  among  the  coefficient  types  in  2KP  instances. 

Chapter  2  contains  theory  providing  a  foundation  for  characterizing  miiltivari- 
ate  composite  distributions.  More  specifically,  Chapter  2  presents 

•  characterizations  of  extreme-correlation  distributions  for  both  discrete  and 
continuous  multivariate  random  variables, 

•  methods  for  characterizing  multivariate  distributions,  based  on  a  specified 
Pearson  product-moment  correlation  structure,  as  a  composition  of  extreme- 
correlation  distributions  and  the  joint  distribution  under  independence, 

•  closed-form  methods  for  characterizing  composite  distributions  for  trivariate 
random  variables,  and  a  simple  procedure  for  finding  composition  probabili¬ 
ties  for  quadravariate  random  variables,  and 

•  methods  for  selecting  feasible  correlation  structures  for  both  trivariate  and 
quadravariate  random  variables. 


5 


Chapter  3  demonstrates  the  practicality  of  using  composite  distributions  to 
induce  correlation  exphcitly  among  three  types  of  coefficients  in  2KP.  More  specif¬ 
ically,  Chapter  3  presents 


•  an  experiment  design  for  investigating  solution  procedure  performance  that 
takes  advantage  of  multivariate  exphcit  correlation  induction,  and  treats  the 
population  correlation  structure  and  the  correlation  measure  as  factors  in 
the  experiment,  and 

•  insights  regarding  the  synergistic  effect  between  population  correlation  struc¬ 
ture  and  constraint  slackness  on  both  the  characteristics  of  the  synthetic  2KP 
and  the  ability  of  solution  procedures  to  solve  the  problem. 


CHAPTER  II 

MULTIVARIATE  COMPOSITE 
DISTRIBUTIONS  FOR  COEFFICIENTS  IN 
SYNTHETIC  OPTIMIZATION  PROBLEMS 


2.1  Introduction 


This  chapter  presents  a  characterization  of  composite  distributions  for  multivariate 
random  variables  with  specified  marginal  distributions  and  a  specified  Pearson 
product-moment  population  correlation  structure.  A  composite  distribution  for 
a  multivariate  random  variable  Y  =  (Yi,  •  •  •  ?  Ifc)  is  a  distribution  that  may 
be  represented  as  a  convex  combination  of  other  valid  distributions  for  Y.  The 
Pearson  product- moment  correlation  between  random  variables  Yi  and  Yj,  where 

(2.1) 

(Var(yi)Var(Yj))2 

is  a  measure  of  the  strength  of  the  linear  relationship  between  Y  and  Yj. 

The  principal  motivation  for  this  research  is  the  generation  of  synthetic  opti¬ 
mization  problems,  which  is  too  infrequently  viewed  as  an  application  of  multi¬ 
variate  sampling.  (However,  this  research  is  applicable  to  many  other  simulation 
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applications,  e.g.,  simulations  of  manufacturing  systems.)  Synthetic  optimization 
problems  are  used  to  test  and  compare  algorithms  and  heuristics  because  of  lim¬ 
ited  supplies  of  real-life  test  problems.  A  common  practice  is  generating  the  co¬ 
efficients  for  these  problems  imder  mutual  independence.  This  approach  alone 
is  inadequate  when  there  may  be  dependencies  among  the  coefficients  in  real-life 
problems.  An  alternative  approach  is  to  induce  several  structrued  dependencies 
between  some  types  of  coefficients  to  provide  a  greater  variety  of  test  problems, 
from  easy  ones  to  difficult  ones,  and  a  higher  degree  of  realism  in  the  test  problems. 
The  composite  distributions  described  in  this  chapter  facilitate  the  generation  of 
synthetic  optimization  problems  with  a  dependence  structure  represented  by  a 
Pearson  product- moment  population  correlation  matrix. 

Some  researchers  have  induced  correlation  between  objective  fimction  and  con¬ 
straint  coefficients  in  synthetic  optimization  problems  and  found  that  the  level 
of  correlation  is  related  to  the  performance  of  solution  methods  (Martello  and 
Toth,  1979,  1981,  1988;  Balas  and  Zemel,  1980;  Balas  and  Martin,  1980;  Potts 
and  Van  Wassenhove,  1988;  Guignard  and  Rosenwein,  1989;  John,  1989;  Reilly, 
1991;  Rushmeier  and  Nemhauser,  1993;  Moore  and  Reilly,  1993;  Amini  and  Racer, 
1994;  Carlo  et  al,  1995).  The  correlations  between  the  coefficients  in  different 
constraints  may  also  be  related  to  solution  method  performance,  but  this  possi¬ 
bility  has  only  been  systematically  investigated  by  Hill  (Chapter  3).  He  uses  the 
characterizations  of  multivariate  distributions  presented  in  this  paper  to  investi¬ 
gate  the  effects  of  interconstraint  correlation,  as  well  as  those  of  the  correlations 
between  the  objective  function  and  constraint  coefficients,  on  the  performance  of 
standard  solution  methods  on  two-dimensional  knapsack  problems.  He  observes 
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that  the  interconstraint  correlation  term  has  at  least  as  significant  a  relationship 
to  solution  method  performance  as  the  correlations  between  the  objective  function 
coefficients  and  the  coefficients  in  either  of  the  constraints. 

There  are  usually  an  infinite  number  of  ways  to  characterize  a  joint  distribution 
for  Y  when  the  marginal  distributions  for  i  —  1,2,  and  a  population 

correlation  structure  are  specified.  Composite  distributions,  and  in  particular  those 
whose  constituent  joint  distributions  have  a  simple  form,  are  easy  to  sample  from. 
Consequently,  attention  here  is  restricted  to  multivariate  composite  distributions 
that  are  composed  of  the  joint  distribution  imder  independence  and  the  extreme- 
correlation  distributions,  the  2*^“^  distributions  of  Y  for  which  Corr(Y,  Yj)  has 
either  its  most  positive  or  most  negative  possible  value  for  sl\  i  <  j  <  k. 

This  paper  is  organized  as  follows.  In  §2.2,  imphcit  and  explicit  correlation  in¬ 
duction  methods  for  generating  coefficients  for  synthetic  optimization  problems  are 
discussed.  Basic  concepts  for  composite  distributions  for  multivariate  random  vari¬ 
ables  are  presented  in  §2.3.  Extreme-correlation  distributions  for  multivariate  ran¬ 
dom  variables  are  characterized  and  used  in  conjunction  with  the  joint  distribution 
imder  independence  to  construct  multivariate  composite  distributions.  Composite 
distributions  for  trivariate  random  variables  are  constructed  using  closed-form  for¬ 
mulas  for  the  composition  probabilities  in  §2.4.  The  limitations  of  extending  the 
composition  probability  formulas  for  trivariate  random  variables  to  multivariate 
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random  variables  are  discussed  in  §2.5.  A  composition  weight  adjustment  tech¬ 
nique  for  constructing  composite  distributions  for  quadravariate  random  variables 
is  presented  in  §2.6.  The  contributions  of  this  research  and  areas  of  further  inves¬ 
tigation  are  summarized  in  §2.7. 

Some  of  the  early  resrdts  of  this  research  appear  in  Hill  and  Reilly  (1994). 


2.2  Background 

How  the  correlation  structure  of  multivariate  random  variables  is  modeled  and  var¬ 
ied  can  affect  the  results  and  conclusions  in  computational  experiments  and  other 
simulation  applications.  For  certain  classes  of  discrete  optimization  problems,  ran¬ 
domly  generated  instances  with  high,  positive  correlation  between  objective  func¬ 
tion  and  constraint  coefficients  are  relatively  hard  to  solve  with  enumerative  pro¬ 
cedures.  Such  resiilts  are  reported  by  Martello  and  Toth  (1979,  1988),  Balas  and 
Zemel  (1980),  and  Reilly  (1991)  for  knapsack  problems;  Balas  and  Martin  (1980) 
for  capital  budgeting  problems;  Rushmeier  and  Nemhauser  (1993)  and  Moore  and 
Reilly  (1993)  for  set  covering  problems;  and  Potts  and  Van  Wassenhove  (1988, 
1992)  and  John  (1989)  for  scheduling  problems.  Instances  of  the  generahzed  as¬ 
signment  problem  with  high,  negative  correlation  between  the  objective  frmction 
and  capacity  constraint  coefficients  are  relatively  hard  to  solve  with  enumerative 
procedures  (Martello  and  Toth,  1981;  Fisher,  Jaikumar,  and  Van  Wassenhove, 
1986;  Guignard  and  Rosenwein,  1989;  Trick,  1992;  Mazzola  and  Neebe,  1993; 
Amini  and  Racer,  1994;  Cario  et  a/.,  1995). 


10 


In  the  rest  of  this  section,  correlation-induction  methods  that  have  been  used 
to  generate  coefficients  for  synthetic  optimization  problems  are  discussed 


2.2.1  Implicit  correlation  induction 

Moore  and  Reilly  (1993)  discuss  three  ways  to  generate  bivariate  random  vari¬ 
ables  as  coefficients  for  synthetic  optimization  problems:  mutual  independence, 
implicit  correlation  induction,  and  explicit  correlation  induction.  They  classify  as 
implicit  correlation  induction  any  generation  method  for  which  the  parameters  of 
the  method  imply  the  population  correlation  between  two  random  variables. 

The  implicit  correlation  induction  method  in  Martello  and  Toth  (1979),  which 
has  been  widely  mimicked  by  others,  induces  dependence  between  two  random 
variables  Yi  and  Y2,  by  generating  a  value  for  Yi  and  then  a  value  of  Y^  =Yi  + 
W,  where  W  is  an  independently  generated  noise  term.  For  instance,  they  let 
Yi  ~  [/{1, 2, . . . ,  100}  (i.e.,  li  has  a  discrete  miiform  distribution  over  the  integers 
from  1  to  100)  and  W  ~  17{— 10,  —9, . . . ,  10}  when  generating  objective  function 
{Y2)  and  constraint  (Fi)  coefficients  for  knapsack  problems.  The  imphed  value  of 
p  =  Corr(Fi,  Y2)  is  above  0.97  in  this  case.  Martello  and  Toth  (1981)  let  Y2  = 
111  —  Yi  -\-W  for  generalized  assignment  problems,  and  the  implied  value  of  p  is 
below  -0.97.  Martello  and  Toth,  as  well  as  other  authors,  call  such  population 
correlation  levels  “weak.”  But,  it  is  not  clear  whether  any  of  these  authors  knows 
the  magnitudes  of  the  values  of  p  imphed  by  the  parameters  used  in  their  problem 
generation  methods. 
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One  may  change  the  support  of  Yi  or  of  W,  or  multiply  either  of  these  random 
variables  by  a  constant,  and  systematically  vary  the  imphed  population  correlation. 
Such  an  approach  was  used  by  Cario  et  al.  (1995). 

Other  implementations  of  implicit  correlation  induction  include  Balas  and  Zemel 
(1980),  Balas  and  Martin  (1980),  Potts  and  Van  Wassenhove  (1988),  Guignard  and 
Rosenwein  (1989),  John  (1989),  Rushmeier  and  Nemhauser  (1993),  and  Amini 
and  Racer  (1994).  Balas  and  Martin  (1980)  generate  capital  budgeting  problems 
with  implicitly  induced  interconstraint  correlations,  as  well  as  objective  function- 
constraint  correlations. 


2.2.2  Explicit  correlation  induction 

According  to  Moore  and  Reilly  (1993),  an  exphcit  correlation  induction  method  is 
one  where  the  user  specifies  the  population  correlation  structure. 

Frechet  (1951)  characterized  bounds  on  joint  probability  distributions  for  (Pi ,  ^2) 
as 

H^{yuy2)  =  max{Fi(yi)  +  ^2(^2)  -  1,  0}  ,  (2.2) 

and 

H'^{yuy2)  =  min{Fi(yi),F2(i/2)},  (2.3) 

where  Fi(^i)  is  the  cumulative  distribution  function  (cdf)  for  Fi,  and  ^2(^2)  is 
the  cdf  for  Y2.  H^{yi,y2)  and  H'^{yi,y2)  are,  respectively,  the  minimum-  and 
maximum-correlation  joint  cdfs  for  (Fi,!^)-  Frechet  shows  that 


H  {yuy2)  <  H{yi,y2)  <  H+{yi,y2) 

for  all  (yi,y2)  and  all  possible  joint  distributions  H{yi,y2). 


(2,4) 
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Frechet  uses  the  bounding  distributions  (2.2)  and  (2.3),  along  with  the  joint 
distribution  ruider  independence,  Fi{yi)F2{y‘i),  to  characterize  two  classes  of  com¬ 
posite  distributions  for  (1^1,12): 

A-ff'(!/i,!/2)  +  (l-A)/f+(j;i,!/2),  0<A<1,  (2.5) 

and 

{l-a-b)Fi[yi)F2{y2)FaH~{yi,y2)^-hH^{yi,y2),  a,  6  >  0, o-h &  <  1.  (2.6) 

The  weights  A  and  (1  —  A)  in  (2.5)  and  a,  6,  and  (1  —  a  —  6)  in  (2.6)  are  referred  to 
as  composition  probabilities.  Nelsen  (1987)  also  describes  composite  distributions 

(2.6). 

A  class  of  joint  distributions  for  (Fi,  I2)  is  comprehensive  if  the  class  includes 
the  bormdary  distributions  (2.2)  and  (2.3)  and  Fi{yi)F2{y2)  (Devroye,  1986).  The 
class  of  composite  distributions  (2.6)  is  comprehensive,  while  the  class  (2.5)  is  not. 

Extreme  mixtures 

The  composite  distributions  (2.5)  are  sometimes  referred  to  as  extreme  mixtmes 
because  they  are  composed  of  just  the  extreme-correlation  distributions  for  (Fi,  F2), 
H~{yi,y2)  and  i7+(yi,y2)-  Suppose  p  is  specified  and  A  =  (/)+  -  p)l{p^  - 
where  p^  and  p^  are,  respectively,  the  maximum  and  minimum  possible  values  of 
p.  Then  there  is  a  unique  extreme  mixture  for  (Fi,  F2)  for  each  value  of  p  such  that 
p~  <  p  <  p'^  ■  Extreme  mixtures  are  easy  to  use  but  cannot  generate  observations 
of  (Fi,F2)  with  Fi  and  F2  independent  because  extreme  mixtures  do  not  form  a 
comprehensive  class  of  distributions  for  (Fi,F2).  Extreme  mixtures  apply  to  both 
discrete  and  continuous  random  variables. 
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Conventional  mixtures 

The  composite  distributions  (2.6)  are  sometimes  referred  to  as  conventional  mix¬ 
tures  when  either  a  =  0  or  6  =  0.  Suppose  p  is  specified.  Then,  a  =  0  and 
b  =  p/p'^  if  p  >  0,  and  a  =  pj p~  and  6  =  0  if  p  <  0.  Conventional  mixtmes  form 
a  comprehensive  class  of  distributions  for  {Yi,Y2),  are  easy  to  use,  and  provide  a 
unique  distribution  for  all  values  of  p  such  that  p~  <  p  <  p^ .  However,  a  conven¬ 
tional  mixture  cannot  generate  values  of  {Yi,Y2)  with  Yi  and  Y2  uncorrelated  but 
dependent.  Conventional  mixtures  apply  to  both  discrete  and  continuous  random 
variables.  Conventional  mixtures  are  also  discussed  by  Schmeiser  and  Lai  (1982) 
and  Nelsen  (1987). 

Moore  and  Reilly  (1993)  generate  set  covering  problem  coefficients  based  on 
conventional  mixtrues. 


Parametric  mixtures 

For  finite  discrete  random  variables  Yi  and  I2,  Peterson  and  Reilly  (1993)  describe 
a  special  case  of  the  distributions  (2.6)  which  they  refer  to  as  parametric  mixtures. 

Define  6  to  be  the  minimum  joint  probability  for  any  value  (2/15^2)  in.  the 
support  of  (Ti,!^).  Let  fi{yi)  and  /2(t/2)  be  fhe  pmfs  for  Yi  and  Y2,  respectively. 
Also  let  i*  =  argmini{/i(?/ii)};  j*  =  argminj{/2(y2j)};  and  =  fi{yii*)f2{y2j*)- 
Suppose  that  (p,  6)  is  a  point  such  that 


0  <  6>< 


(2.7) 
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and 

(1  -  ei6^)p-  <p<{l-  0/r  )p+.  (2.8) 

Suppose  also  that  [yu* ,y2j*)  =  h~{yii^,y2j*)  =  0,  where  h+(yi,y2)  and  h-{yi,y2) 
are  the  maximum-  and  minimum-correlation  probabihty  mass  functions  (pmfs)  for 
(11,12))  respectively.  Peterson  and  Reilly  show  that  Corr  (1^1,12)  =  P  and  the 
minimum  joint  probability  is  6  for  the  following  composite  distribution: 

(1  -  a  -  b)fi{yi)f2{y2)  +  ah~{yi,y2)  +  bh^{yuy2\  (2.9) 

where 

a  =  ((1  -  e/0*)p*  -  p)  /(p+  -  p-),  (2.10) 

and 

b={p-{i-e/e+)p-)/{p^-p-).  (2.11) 

There  are  an  infinite  number  of  parametric  mixtures  (2.9)  for  each  value  of  p 
such  that  p~  <  p  <  p'^,  but  a  luiique  parametric  mixture  for  each  point  (/?,  6)  that 
makes  either  inequality  in  (2.8)  active.  For  bivariate  discrete  random  variables, 
extreme  and  conventional  mixtm:es  are  special  cases  of  parametric  mixtures. 

Reilly  (1991)  generates  knapsack  problems  based  on  parametric  mixtrures.  Yang 
(1994)  generates  knapsack  problems  and  Carlo  et  al.  (1995)  generate  generalized 
assignment  problems  based  on  parametric  mixtures,  including  extreme  mixtures 
(2.5)  and  conventional  mixtures. 


2.2.3  Explicit  rank  correlation  induction 

Iman  and  Conover  (1982)  describe  a  method  for  generating  n  samples  of  a  k- variate 
random  variable  Y  with  specified  marginal  distributions  and  a  specified  Spearman 
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rank  population  correlation  structinre.  Their  method  shuffles  n  independently  gen¬ 
erated  components  of  a  multivariate  random  variable  across  k  vectors  so  that  the 
sample  Spearman  correlation  structrue  approximates  a  specified  Spearman  rank 
population  correlation  structure,  M.  First  generate  two  matrices  R  and  V  such 
that  R  is  an  (n  x  k)  matrix  of  van  der  Waerden  scores,  randomized  within  each 
of  the  k  columns,  and  V  is  an  (n  x  k)  matrix  of  n  independent  observations  of 
each  of  the  k  random  variables.  Consider  each  column  of  R  as  n  observations  of 
k  random  variables  and  compute  T,  the  corresponding  sample  rank  correlation 
matrix.  Compute  the  Choleski  factorizations  A  and  Q  such  that  T  =  AA'  and 
M  =  QQ^  Compute 

S  =  R(AQ-^)',  (2.12) 

which  is  a  transformed  matrix  of  scores.  The  k  columns  of  n  values  in  S  have 
a  sample  rank  correlation  structrue  that  approximates  M.  The  entries  in  each 
column  of  V  are  reordered  so  that  their  rankings  are  the  same  as  the  rankings  in 
the  corresponding  columns  of  S.  The  sample  Spearman  rank  correlation  structure 
of  the  shuffled  matrix  of  observations,  V,  approximates  the  specified  correlation 
structrue,  M. 

Hill  (Chapter  3)  compares  the  performance  of  an  algorithm  and  a  heuristic  on 
two-dimensional  knapsack  problems  generated  based  on  composite  distributions 
with  specified  Pearson  product-moment  population  correlation  structrues  and  with 
Iman  and  Conover’s  method  for  the  same  Spearman  rank  population  correlation 


structrues. 
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2.3  Explicit  Correlation  Induction  for  Multivariate  Ran¬ 
dom  Variables  Using  Composition 

In  this  section,  extreme- correlation  distributions  for  a  multivariate  random  variable 
Y  are  characterized  and  used  to  construct  composite  distributions  with  a  speci¬ 
fied  correlation  structrue.  A  procedure  for  generating  samples  from  composite 
distributions  of  multivariate  random  variables  is  presented. 


2.3.1  Extreme-correlation  distributions  for  Y 

Assume  there  are  k  random  variables,  Yi,i  =  1^2^ ...  ^k.  Each  Y  has  support  Si,  is 
distributed  according  to  fi{yi),  and  has  cdf  Fi[yi).  Let  Y=(Yi,  Y2?  •  •  •  ?  Ifc)  and  y  = 
(yij  y2i  -  ■  •  1  yk)  be  a  value  of  Y.  Let  S  =  SiX  S2X  •  •  •  x  the  support  of  Y.  Each 
feasible  joint  distribution  for  Y,  h{y),  has  a  Pearson  product- moment  correlation 
structme  whose  correlation  terms  comprise  a  'Vector,  p  —  (pi2,  piz, ,  Pk,k-i), 
where  pij  =  Corr(yj,  Yj)  for  alH  <  j  <  k. 

Let  be  the  set  of  all  feasible  bivariate  distributions  hij{yi,yj)  for  (1^,1}), 
for  8.Ui  <  j  <  k.  Let  K^j  =  maXft.^.e^i^.{E(Yil^)}  and  =  minft..g$j.{E(Yil})} 
for  all  i  <  j  <  k.  The  maximum  and  minimum  values  of  each  pij  are 

pt  =  {Ktj  -  E(ri)E(yi))  /  (Var«)Var«.))‘  (2-13) 

and 

Pij  =  {Krj  -  E(y,)E(y,.))  /  (Vai(«)Var(y,))i  ,  (2.14) 

respectively.  Peterson  (1990)  presents  a  factored  transportation  problem  to  find 
or  for  finite  discrete  random  variables  Y  and  Yj.  He  finds  with  the 
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Northwest  Corner  Rule  (NWCR)  and  uses  the  Southwest  Corner  Rule  (SWCR)  to 
find  .  For  a  ge  i  j 

K±  =  jyr\u)Fr\u)  du  (2.15) 

and 

- «)  (216) 

Define  a  correlation  point,  p  =  (pi2,Pi3,  •  •  -  ,Pfc-i,fc),  as  the  Q^-vector  of  cor¬ 
relation  values  associated  with  some  h{y).  An  extreme- correlation  point  for  Y  is 
a  correlation  point  where  either  =  p7.  or  Pij  =  for  all  i  <  j  <  k.  Each 
extreme- correlation  point  is  associated  with  a  feasible  assignment  of  pfj  or  Pij 
to  each  pij,j  =  2,3, . . .  ,k.  So  there  are  2^~^  extreme-correlation  points  for  Y. 
Denote  the  extreme-correlation  points  as  q^,  £  =  1,2, . . . ,  2^~^. 

For  each  extreme-correlation  point,  define  the  vector  6^  =  {6^2, . .., 
where 

'  1  if  Pij  =  pt.; 

4  =  (2-17) 

0  if  Pij  =  Pij. 

To  determine  the  components  of  each  vector  6^,  the  appropriate  values  are  assigned 
to  6[j,  j  =  2,3, . . .  ,k,  and  then  the  formula 

4  =  1  - 14  -  41  (2-18) 

is  used  to  find  the  remaining  6fj  values.  Tables  2.1  and  2.2  characterize  the  extreme- 
correlation  points  for  fc  =  3  and  fc  =  4,  respectively.  Tables  2.3  and  2.4  provide 
the  corresponding  values  for  6fj. 


neral  random  variable  {Y,Y), 


8 


Table  2.1;  Extreme-correlation  points  when  k  =  3 


Table  2.2:  Extreme-correlation  points  when  fc  =  4 


Pij 

Pi  2 

Pl3 

PlA 

P2Z 

P24 

PZ4 

1 

Pl2 

PlZ 

Pli 

Ptz 

P24 

P34 

2 

Pl2 

PlZ 

Pl4 

ptz 

P24 

P34 

3 

Pl2 

PlZ 

PlA 

P2Z 

P24 

PZ4 

4 

Pl2 

PlZ 

Pl4 

P23 

P24 

PZ4 

5 

Pl2 

PlZ 

Pl4 

P23 

P24 

PZ4 

6 

Pl2 

PlZ 

Pl4 

P23 

P24 

PZ4 

7 

Pl2 

PlZ 

Pl4 

P23 

P24 

PZ4 

8 

Pl2 

PlZ 

Pl4 

P23 

P24 

PZ4 
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Define  the  zero-correlation  point  as  the  zero  vector  of  dimension  and  de¬ 
note  it  qo-  Also  define  P  to  be  the  convex  hull  of  the  correlation  points  = 


that 


and 


If  p  e  P,  then  there  exist  values  >  0,  £  =  0,1,., 

..,2*’-^  such 

2*-! 

p  =  S 

(2.19) 

£=0 

2^-1 

^  Af  =  1. 

^=0 

(2.20) 

Figure  2.1  depicts  an  example  set  P  where  fc  =  3.  According  to  Rousseeuw  and 
Molenberghs  (1994),  all  feasible  correlation  structures,  {pi2,  pis,  P2s),  for  such  a 
trivariate  random  variable  are  contained  in  an  elliptical  tetrahedron.  P  is  a  proper 
subset  of  this  elliptical  tetrahedron;  the  extreme  points  of  P  in  Figure  2.1  are  the 
extreme-correlation  points  q^,  =  1, 2, 3,4,  characterized  in  Table  2.1  and  are  the 

extreme  points  of  Rousseeuw  and  Molenberghs’  elHptical  tetrahedron. 

An  extreme-correlation  distribution  ior  Y  is  a  joint  distribution  for  which  either 
Pij  =  ptj  or  pij  =  plj,  for  all  i  <  j  <  k.  Denote  the  2*'“^  extreme- correlation  distri¬ 
butions  as  hi{y),  £  =  1,2,...,  2*’“^.  There  is  a  one-to-one  correspondence  between 
the  extreme-correlation  points  in  P  and  the  extreme-correlation  distributions  of 


Y. 
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Figure  2.1:  Example  of  P  for  fc  =  3 


2.3.2  Constructing  composite  distributions 

Let 

ho{y)  =  ll  MVi)  (2.21) 

1=1 

be  the  joint  distribution  of  Y  under  independence.  A  comprehensive  class  of 
composite  distributions  for  Y  is  given  by: 


n  =  {  h{y) 


Hy)  =  E  ^My),  E  =  i.  ^  s  o,  } . 

£=o  e=o 


(2.22) 


The  class  fl  generalizes  the  comprehensive  class  (2.6)  of  composite  distributions 
for  bivariate  random  variables  introduced  by  Frechet  (1951)  in  the  sense  that  0 
includes  each  of  hi{y),  £  =  0, 1, . . . ,  2*^“^.  It  is  clear  from  the  definitions  of  Q, 
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and  P  that  p  G  -P  if  and  only  if  there  is  some  joint  distribution  h{y)  €  0  whose 
correlation  structure  is  given  by  p.  P  contains  correlation  points  that  correspond 
to  correlation  structures  that  are  expressable  as  convex  combinations  of  q^,  ^  = 
0, 1, . . . ,  2*^“^,  but  does  not  necessarily  contain  all  feasible  correlation  structmes 
for  Y,  as  demonstrated  in  Rousseeuw  and  Molenberghs  (1994). 

Let  p  G  P  represent  the  desired  population  correlation  structure.  Constructing 
a  composite  distribution  h(y)  G  O  associated  with  p  G  P  requires  a  composition 
probability  vector,  A,  that  satisfies  the  following  conditions; 


2*:-! 


+  (1  “  4)%]  =  Pi:>  k, 

(2.23) 

e=i 

2k- 1 

(2.24) 

^=0 

Xe>0  £  = 

(2.25) 

A  composite  distribution  h{y)  G  Q.  with  a  minimum  value  of  Aq  is  referred  to 
here  as  a  Type  L  distribution.  In  many  cases,  Ao  =  0  for  Type  L  distributions, 
meaning  ho(y)  is  not  included  in  the  composition.  It  is  easily  shown  for  bivariate 
random  variables  that  extreme  mixtures  are  Type  L  composite  distributions  (see 
the  Appendix  to  this  chapter  for  details). 

A  composite  distribution  h{y)  G  Q  with  a  maximum  value  of  Aq  is  referred 
to  here  as  a  Type  U  distribution.  For  bivariate  random  variables,  conventional 
mixtmes  are  Type  U  distributions  (see  the  Appendix  to  this  chapter  for  details). 

For  any  p  G  P,  the  corresponding  Type  L  and  Type  U  composite  distributions 
define  a  range  of  composite  distributions  in  Q  with  the  correlation  structme  p. 
While  the  correlation  structure  for  the  distributions  within  this  range  of  composite 
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distributions  does  not  change  as  Aq  changes,  the  distributions  themselves  can  be 
strikingly  different.  Therefore,  Aq  may  be  considered  an  index  for  the  composite 
distributions  in  SI  that  are  associated  with  each  p  G  P. 

Two  examples  for  bivariate  random  variables  are  used  to  illustrate  the  range 
of  possible  composite  distributions  defined  by  the  limiting  Type  L  and  Type  U 
distributions.  The  first  example  illustrates  how  the  value  of  Aq  affects  the  nature 
of  the  pmf  for  Y  for  discrete  random  variables.  The  second  example  illustrates  a 
similar  point  when  Y  is  continuous. 

Example  1  (Hill  and  Reilly,  1994).  Let  Yi  ~  17(1,2,3,4,5}  and  let  Y2  he  a 
binomial  random  variable  with  3  independent  trials  and  success  probability  0.4. 
Suppose  that  the  desired  population  correlation  value  is  p  =  0.6.  The  pmf  shown 
in  Figure  2.2  is  the  Type  L  composite  distribution  with  Aq  =  0,  Ai  =  0.1715  and 
A2  =  0.8285.  The  pmf  shown  in  Figure  2.3  is  the  Type  U  composite  distribution 
with  Aq  =  0.3431,  Ai  =  0,  A2  =  0.6569.  Note  in  Figure  2.2  that  the  joint  probability 
for  7  of  20  members  of  Si  x  5'2  is  zero,  while  each  member  of  Si  x  S2  has  positive 
probabihty  with  the  pmf  in  Figure  2.3.  □ 

Example  2  (Hill  and  Reilly,  1994).  Let  Yi  and  y2  be  exponential  random 
variables  with  unit  mean  and  p  —  0.4.  From  Page  (1965)  it  is  known  that  p"*"  =  1.0 
and  p“  =  1  —  7r^/6.  Figme  2.4  shows  1000  observations  based  on  the  Type  L 
composite  distribution  with  Aq  =  0,  Ai  =  0.36,  and  A2  =  0.64.  Figure  2.5  shows 
1000  observations  based  on  the  Type  U  composite  distribution  with  Aq  =  0.6, 
Ai  =  0,  and  A2  =  0.4.  Including  the  independent  pdf  in  the  composition,  as  in  the 
Type  U  distribution,  leads  to  a  greater  variety  of  possible  realizations  of  Y.  □ 
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Y2 

n 

0 

1 

2 

3 

1 

0.1657 

0 

0.0233 

0.0110 

2 

0.0132 

0.1607 

0.0261 

0 

3 

0 

0.2000 

0 

0 

4 

0.0028 

0.0713 

0.1259 

0 

5 

0.0343 

0 

0.1127 

0.0530 

Figure  2.2:  Example  Type  L  composite  pmf,  p  =  0.6 


1^2 

Ti 

0 

1 

2 

3 

1 

0.1462 

0.0296 

0.0198 

0.0044 

2 

0.0254 

0.1504 

0.0198 

0.0044 

3 

0.0148 

0.1610 

0.0198 

0.0044 

4 

0.0148 

0.0614 

0.1194 

0.0044 

5 

0.0148 

0.0296 

0.1092 

0.0464 

Figure  2.3:  Example  Type  U  composite  pmf,  p  =  0.6 
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Given  a  valid  composition  probability  vector,  the  following  procedure  generates 
values  of  Y  based  on  a  composite  distribution  in  0,  with  the  correlation  structure 
associated  with  the  correlation  point  p. 

Procedure  RVARl 

1.  Generate  Mi,  «2;  •  ■  • ,  ~  £/(0. 1). 

2.  If  Mfc+i  <  Ao,  then  for  i  =  1, 2, . . . ,  fc, 
yi  —  F['^{ui)  and  go  to  Step  6. 

Otherwise,  set  m  =  1,  F  =  Aq  +  Ai. 

3.  If  Mfc+i  >  r,  go  to  Step  4.  Otherwise,  go  to  Step  5. 

4.  m'f— m  +  l,r<— r  +  Xtn-  Go  to  Step  3. 

5.  Generate  y  with  Mi  based  on  gm{y)- 

(a)  yi  ^Ff^(Mi). 

(b)  yi  =  F-\l  +  Ml (26^  -  1)  -  Sr^),  *  =  2, 3, ... , 

6.  Return  y. 

RVARl  uses  /c  +  1  random  numbers  per  observation  of  Y.  One  random  number, 
Mfe+i,  is  used  to  select  a  constituent  distribution  of  the  composite  distribution. 
Another  random  number,  Mi,  is  used  to  generate  a  value  of  Yi,  and  the  Sfj  values 
determine  whether  Mi  or  1  —  Mi  is  used  for  sampling  values  . . .  ^Yk  for  any 

extreme-correlation  distribution.  The  remaining  fc  —  1  random  numbers  are  used 
only  for  independent  sampling,  i.e.,  if  M^+i  <  Aq.  RVARl  is  designed  to  facilitate 
synchronized  sampling,  which  is  useful  in  many  computational  experiments.  An 
alternative  to  RVARl  is  a  more  efficient  procedure  that  generates  values  of  Y  using 
an  expected  number  of  2  -|-  (fc  —  l)Ao  random  numbers  per  observation,  rather  than 
the  constant  A;  -f- 1  random  numbers. 
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2.4  Explicit  Correlation  Induction  For  Trivariate  Random 
Variables 

This  section  introduces  closed-form  formulas  for  the  ruiique  composition  proba- 
bihties  for  Type  L  distributions  for  tri variate  random  variables.  These  formulas 
are  extended  to  other  composite  distributions,  including  Type  U  composite  dis¬ 
tributions.  Throughout  this  section,  it  is  assumed  that  k  =  3,  Y  =  (Vi,  12,^3), 
y  =  (?/i,2/2,?/3),  and  fi{yi)  >  0  for  all  yi,  i  =  1,2,3. 


2.4.1  Type  L  composite  distributions  for  trivariate  random 
variables 


Define  py  =  (p7.  -f  p^j)/2  for  i  =  1, 2,  j  =  i  -f  1, 3.  Let  p  €  P,  Xq  =  0,  and 


Xp  — 


^  =  1,2, 3, 4. 


(2.26) 


Consider  the  sets 


Te  = 


U(p)  =  1 + E  E  (24  - 1) 


i=l  j=i+l 


‘^{Pij  Pij) 
Pij  ~  Pij 


£-1,2, 3, 4,  (2.27) 


and  refer  to  Figure  2.1.  AU  convex  combinations  of  q2,  qa,  and,  q4  belong  to  Ti- 
Similarly,  aU  convex  combinations  of  qi,  qs,  and,  q4  belong  to  T2;  of  qi,  q2,  and,  q4 
belong  to  T3  and  of  qi,q2,  and,  qs  belong  to  The  next  proposition  and  the 
following  corollary  establish  the  relationship  between  the  points  in  P  and  the 
formulas  (2.26). 
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Proposition  2.1  For  any  random  variable  Y, 


l  +  E  <=1.2.3, 4. 


i=l  j=i+l 

are  valid  inequalities  for  P. 


Pij  Pij 


(2.28) 


Proof:  The  set  P  contains  all  correlation  points  that  are  convex  combinations  of 
the  extreme-correlation  points,  qi,  q2,  qs,  and  q4.  It  may  be  verified  that  fi(qi)  > 
0,  t2(q2)  >  0)  >  0)  t4(q4)  >  0.  Let  q  be  any  convex  combination  of 

q2,  qs,  and  q4.  Then  ti(q)  >  0,  t2(q)  >  0,  t3(q)  >  0,  and  t4(q)  >  0.  □ 


Corollary  2.1  For  any  random  variable  Y,  the  valid  inequalities  (2.28)  are  facets 
ofP. 

Proof:  P  has  dimension  3.  Each  T^,  £  —  1,2, 3, 4,  has  dimension  2.  It  follows 
from  Proposition  2.1  and  the  definition  of  the  sets  Ti  that  each  Ti  is  a  face  of  P 
and  therefore  a  facet  of  P.  □ 

The  following  result  establishes  that  there  is  a  unique  solution  to  (2. 23)- (2. 24) 
for  any  p  E  P  and  any  value  of  Aq  such  that  0  <  Aq  <  1. 


Proposition  2.2  For  any  value  of  Aq,  such  that  0  <  Aq  <  1,  and  any  p  E  P, 
there  is  a  unique  solution  to  (2.23)- (2.24). 

Proof:  For  any  value  of  Aq  such  that  0  <  Aq  <  1,  the  equations  (2.23)-(2.24) 
reduce  to: 


^lPl2  +  -^2^12  +  ^3Pi2  +  -^4^12  ~  Pl2 


(2.29) 
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^lPl3  +  ^2Pi3  +  ^3Pl3  +  -^4^13  =  Pl3 

^lP23  +  ^2P23  +  -^3^23  +  -^4^23  “  P23 

Ai  +  A2  +  A3  +  A4  =  1  —  Aq 

In  matrix  terms  this  can  be  written  as  Ax  =  b,  where 

P12  P12  P12  P12  '' 

Pi3  Pi3  Pi3  Pi3  ^2  33) 

P23  P23  P23  P23 

1111/ 

X  =  (Ai,  A2,  A3,A4)^,  and  b  =  (/f)i2,Pi3,P23, 1  -  Aq)^.  If  det(A)  7^  0,  then  A“^ 
exists  and  x  =  A“^b  is  the  unique  solution.  Performing  elementary  row  and 
column  operations  on  A  yields  the  following  matrix: 

^  ^  0  P12  “  P12  P12  —  P12 

^  Pl3  “  Pl3  0  Pl3  ~  Pl3 
0  P23  “  P23  P23  ~  P23  0 

V 1  0  0  0 

It  follows  that  det(A)  =  det(A')  =  -2(pr2-Pi2)(pr3-Pi3)(P^3-p23)'  Since  <  0 
and  pfj  >  0  for  all  i  <  j,  det(A)  <  0.  □ 

The  next  two  propositions  establish  that  the  values  for  A^,  ^  —  1, 2, 3, 4,  given  in 
(2.26)  and  Aq  =  0  satisfy  (2. 23)- (2. 25),  and  therefore  constitute  unique  composition 
probabihties. 

Proposition  2.3  The  values  for  Xe,  £  =  1,2,3, 4,  given  in  (2.26)  and  Aq  =  0, 
satisfy  (2.23). 

Proof:  Without  loss  of  generality,  consider  equation  (2.23)  for  any  i  <  j  <  k: 

+  (1  -  ^ij)XePij]  =  ^  ^ij^eptj  +  ^(1  -  ^ij)XiPij 

e=i  e=i 


(2.30) 

(2.31) 

(2.32) 
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=  4E4 


e=i 


+ 


f=i 


i+ELE,Wi(2««-i)%Si^ 


—  0/»+ 


"1  1  2(pij  Pii)  “ 

1 

J" 

1 

pTj-pTj 

+  ‘2Pij 

^ij  Pij 

4 

4 

,  pij  pij 
2^  pf.  —  pT. 

Htj  Htj 


+  A 


V 


^  _  pij  pij 
^  Pij  ~  Pij  J 


(Pfj?  -  (Pii)‘^  +  “^Pijipti  -  Pij)  -  “^PijiPii  -  Pij) 


t'ZQ/ 


IJ! 


^ijKt'ij  Hij) 


2(4 -A7) 


tj  rij  / 

(4 + pii)(pti  -  Pii)  +  (4  “  P7i)i‘^pij  -  ^pij) 


“^(plj-pij) 


{pti  +  Pij)  +  2(pij  -  Pij) 


pij  +  pij  pij 
Pij-  ^ 


Proposition  2.4  For  any  p  E  P,  the  values  of  Xi,  £  =  1,2, 3, 4,  in  (2.26)  and 
Ao  —  0  satisfy  (2.2 f)  and  (2.25). 


Proof:  Let  p  E  P.  By  Proposition  2.1,  the  numerator  of  A^,  £  —  1,2, 3,4,  is 
nonnegative.  Therefore,  A^  >  0  for  .^  =  1,  2,3,4.  Consider  Yle==i  For  any  i  <  j. 


- 1)%-^  =  0, 


e=i 


Pij  Pi 


(2.35) 


so  that  Yli=i  =  Yle=i  1/4  =  1.  □ 

The  next  two  propositions  establish  more  results  regarding  Type  L  distribu¬ 
tions.  The  first  result  provides  the  conditions  under  which  a  Type  L  distribution 
exists  for  p  =  qo  G  P,  and  the  second  result  indicates  that  there  are  many  Type  L 

j 

distributions  with  Aq  =  0  if  there  is  such  a  distribution  for  p  =  qo  €  P. 
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Proposition  2.5  There  always  exists  a  Type  L  composite  distribution  for  p  = 
qo  G  P  for  which  Aq  =  0  whenever 

E  E  (24  - 1)4^  <1  <  =  1.2,3,4.  (2.36) 

tijiiii  Plj-Pa 

Proof:  Let  p  =  (0, 0, 0)  and  Aq  =  0.  Condition  (2.36)  ensures  that  A^  >  0,  ^  = 

1,2, 3, 4.  □ 

For  many  distributions  used  in  practical  applications,  condition  (2.36)  is  easily 
satisfied.  For  instance,  if  every  marginal  distribution  is  uniform,  then  pfj  +  pfj  =  0 
for  all  i  <  j  <  3. 

Proposition  2.6  If  there  is  a  Type  L  distribution  with  Aq  =  0  associated  with  the 
correlation  point  qo  E  P,  then  there  is  a  Type  L  distribution  with  Aq  =  0  associated 
with  every  pE  P. 

Proof:  Suppose  there  is  a  Type  L  distribution  with  Aq  =  0  associated  with  qo  E  P. 
Then  there  exist  Of,  ^  =  0, 1, 2, 3, 4,  with  Oq  =  0  such  that 

=  (2.37) 

Yle=i  ctf  >  0,  £  =  1, 2, 3,4. 

Select  any  p  E  P.  There  exists  a  composition  probability  vector.  A,  such  that 

4 

p  = 

£=0 

4 

=  Aoqo  +  ^  A^q^ 

i=i 
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4  4 

=  Ao  ^  ^ 

£=1  i=l 

4 

~  ^(Ao<^£  +  A£)qf 

t=\ 

~  X]  5 

i=\ 

where  A^  =  (Aqo;^  +  A^),^  =  1,2, 3, 4.  Since 

+  Af)  =  Ao  X^OTf  +  ^  Af  =  Ao  +  (1  —  Ao)  =  1,  (2.38) 

e=i  fei  f=i  fci 

one  can  set  A()  =  0  and  there  is  a  Type  L  composite  distribution  associated  with 

peP.  O 

Additional  composition  probabilities  for  p  £  P  such  that  Ao  >  0  are  presented 
in  the  next  subsection. 


2.4.2  Other  composite  distributions  for  trivariate  random 
variables 


Let  p  E  P, 


and 


E  (24-1) 

t=l  i=i+l 


1)  ,  £  =  1,2, 3, 4, 

Pij  Pij 

(2.39) 

.  /4A4 

Mm/ 

(2.40) 

The  following  three  propositions  establish  that  a  composite  distribution  for  Y  with 
feasible  correlation  structure  p  is  given  by 


My)  =  XlT£^f(y)) 

£=0 


(2,41) 
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where 


70  +  ELi E,Wi(24  - 


£  =  1,2,3,4. 


(2.42) 


Proposition  2.7  Lei  p  £  P.  The  values  =  1,  2,3,4,  given  in  (2-42),  and 

7o,  given  in  (2.40),  satisfy  (2.23). 

Proof:  Without  loss  of  generality,  consider  equation  (2.23)  for  any  i  <  j  <  k: 


+  (1  -  ^ijhePij] 
e~i 


7o(0)  +'^^ij'rePii  +  ^(1  -  SfjhePi^ 

i=l 

1 1  -Tt.  + 


e-1 
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+ 


«=i 

4 


1  -  70  +  m 


e=i 


1  70  .  Pij  -(l-7o)Pi; 


_ _ 4. 

2  2 


Pij  Pij 


+  Pij 


1  70  Pij  -  (t  -  7o)Pij 

2  2 


Pij  Pij 


{Ptj  +  Pij  )(Pi^  -  Pij )  -  7o(p5  +  Pij ) (Pij  -  Pij )  +  2pij  (pt  -  p. . ) 

2(p5-P^) 

2(l-7o)pij(pt  -p-f) 

Hpt-pij) 

Pij  -  {joPij  +  (1  - 70) pij)  +  Pij 
Pij-  O 


Proposition  2.8  //  Aq  =  0  and  >  0  for  £  =  1,2, 3, 4,  then  d^  >  0  for  i  = 

1,2, 3, 4. 

Proof:  For  any  specified  marginals,  di  is  a  constant  for  all  population  correlation 
structures.  Suppose  that  Aq  =  0  and  A^  >  0,  ^  =  1, 2,  3, 4.  Consider  any  A^  for  the 
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correlation  point  p  =  (0,0, 0).  Then, 


-  4 


,  ,  ^  ^  (2Sf,  -  l)2p«  ^  ^  (25?.  -  l)2p« 

2^  O+.-n-  2^  2^  of--o- 

i=l  J=i+1  Fij  Fij  1=1  J=i+1  Ftj  Fij 


(2.43) 


=  -di>0. 
4 


So  di  >  QJ  =  l,2,3,4.n 


Proposition  2.9  // Aq  =  0  and  Xe  >  0^£  =  1,2, 3, 4,  then  the  values  of  = 
0,1, 2, 3, 4,  given  by  (2-40)  and  (2.42)  satisfy  (2.24)  (2.25). 


Proof:  Suppose  that  Aq  =  0  and  A^  >  0,  =  1, 2, 3, 4.  For  every  £.,£  —  1, 2, 3, 4, 


0  <  7o  < 

di 


(2.44) 


It  follows  that 


70 


2  3 

EE 

i=l j=i+l 


lode  ^  4A£ 

i26i,  -  l)2py 


o 

Pij 


<-  14EE‘“' 

i=l  j=i+l 


n 


X)2{pij  pij) 


,  ^  ^  (25|,.  -  l)2{pij  -  (1  -  7o)a^) 

i-7«  +  E  E  ^ - 

i=i  etj 

^le 


>  0 
>  0. 


Therefore,  >  0,  ^  =  1,2, 3, 4.  Similar  to  arguments  in  the  proof  of  Proposi¬ 
tion  2.4,  El=i  7f  =  1  -  7o  so  that  70  4-  E|=i  7^  =  1-  n 
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Note  that  the  composition  probabihty  formula  (2.42)  reduces  to  the  Type  L 
formula  (2.26)  when  70  =  0.  The  value  of  70  may  be  thought  of  as  an  index  for 
the  composite  distributions  for  a  specified  p  E  P.  For  any  p  G  P,  if  >  0  for 
i  =  1,2, 3, 4,  with  at  least  one  =  0,  then  7*  =  0.  In  this  case,  the  unique 
distribution  for  p  is  both  Type  L  and  Type  U.  The  next  result  shows  that  the 
composite  distribution  is  Type  U  when  70  =  7*. 

Proposition  2.10  When  70  =7*,  composite  distributions  (2.41)  are  Type  U  dis¬ 
tributions. 

Proof:  Let  70  =7*-  Then  using  the  composition  probability  formula  (2.42)  yields 

7*  7*  7*  7* 

7  =  (70, 7i,  72, 73, 74)  =  (7*,  Ai  -  — di,  A2  -  —d2,  A3  -  —da,  A4  -  — ^4).  (2.45) 

From  Propositions  2.7  and  2.9,  the  vector  7  represents  a  feasible  set  of  composition 
probabihties.  Since  70  =  7*,  7i  =  0  for  at  least  one  i,  i  =  1,2,  3,4.  Without  loss 
of  generahty,  assume  that  71  =  0.  Then  7  satisfies  (2.23)-(2.25)  which  reduce  to 


70  +  72  +  73  +  74  =  1, 

(2.46) 

72P12  +  73P12  +  74P12  =  P12, 

(2.47) 

72P13  +  73^13  +  74P13  =  Pl3, 

(2.48) 

72P^  +  73P23  +  74P23  =  P23, 

(2.49) 

7o,  72,73,74  >  0- 

(2.50) 

The  dual  of  the  linear  program  (LP)  with  the  objective 

to  constraints  (2.46)-(2.50)  is 

of  maximizing  Aq  subject 

Minimize  Wi  +  P12W2  +  +  P23'^4 


(2.51) 
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subject  to 


wi  >  1,  (2.52) 

wi  +  p^2W2  +  ptsWs  +  P23W4  >  0,  (2.53) 

wi  +  P12W2  +  P13W3  +  pts^A  >  0,  (2.54) 

Wi  +  PI2W2  +  ptz'^i  +  P23'^4.  >  0.  (2.55) 

The  complementary  slackness  conditions  (CSC)  are 

(u;i-l)7o  =  0,  (2.56) 

(wi  +  PJ2W2  +  ptzm  +  p23W4h2  =  0,  (2.57) 

{wi  +  P^2'^2  +  P13W3  +  pt^W4)j3  =  0,  (2.58) 

{wi  +  P^2'^2  +  pt3'^3  +  P23'^4)74  =  0.  (2.59) 


If  7o  =  7*  >  0,  then  the  CSC  imply  Wi  =  1.  If  70  =  7*  =  0,  then  Wi  <  1.  In  either 
case,  the  constraints  (2.53)-(2.55)  and  the  CSC  (2.57)-(2.59)  are  satisfied  if 


A 


P12  Pl3  P23 
Pl2  Pl3  P23 
P12  pfs  P23  . 


(2.60) 


is  nonsingular.  is  a  square  submatrix  of  the  nonsingular  matrix 


’ll  1  1  ■ 

0  P12  P12  P12 

0  Pl3  Pl3  Pl3 

.  0  P23  P23  P23  - 


(2.61) 


derived  from  (2.46)-(2.49).  Because  exists,  (A^)”^  exists  and  A  must  be 
nonsingular.  Since  A  is  nonsing\ilar,  (2.45)  is  the  composition  probabihty  vector 
for  a  Type  U  composite  distribution.  □ 
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2.4.3  Feasible  correlation  points  for  trivariate  random  vari¬ 
ables 


A  correlation  structure  for  a  trivariate  random  variable  has  three  interdependent 
correlation  terms.  The  dependency  among  the  three  correlation  terms  means  that 
specifying  values  for  any  two  correlation  terms  limits  the  range  of  feasible  values 
for  the  remaining  correlation  term  (Olkin,  1981). 

Assume  Y  =  {Yi^Y2,Yz)  with  piz  =  p?3  and  p^z  =  P23-  Equations  (2.26)  can 
be  solved  for  the  remaining  correlation  term,  pi2-  Two  equations  provide  upper 
boimds  and  two  equations  provide  lower  bounds  on  the  feasible  range  of  pi2.  The 
range  for  pi2  is  given  by 

'r(pi2,/??3,P23)(Pl2-Pr2)  ,  .  /  ^(pl2,/>?3,/>23)(Pl2-Pr2)  ,  . 

- n - r  Pl2  S  Pl2  S  - - 1-  P125  (Z.oZ) 


where 


'r{pi2,  Piz,  pIz) 


—  max 


1± 


2(Pi3-P13)  ^  2(p^3-P23)^| 

Pl3  “  Pl3  P23  ~  P23  /  )  ' 


(2.63) 


and 


?7(Pi2,P?3>P23)  =min|l± 


2(p?3  -  2(P23  -  P23) 


Pl3  “  Pl3 


P23 


P23 


!■ 


(2.64) 


Example  3:  Suppose  that  Y  =  (Yi,  I2,  Yz)  and  each  of  the  Y,  *  =  E  2, 3  is  a  nega¬ 


tive  exponential  random  variable  with  pi2  =  0.6  and  pi3  =  0.65  specified.  Applying 
(2.62)-(2.64)  gives  the  range  0.25  <  P23  <  0.95  to  ensure  p  =  (pi2,  P13,  P23)  G  P-^ 
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2.5  Extensions  for  General  Multivariate  Random  Variables 


In  this  section,  limitations  of  extending  the  composition  probability  formulas  (2.26) 
and  (2.42)  for  tri variate  random  variables  to  general  multivariate  random  variables 
are  discussed.  Examples  are  provided  to  demonstrate  these  Hmitations. 


A  convenient  approach  to  finding  composition  probabilities  for  Type  L  com¬ 
posite  distributions  when  A:  >  4  might  involve  an  extended  form  of  (2.26)  with 


1  +  s'-i+i 

_ p  _  1  9  QK-’l 

2fc_l  >  , 


(2.65) 


and  Ao  =  0.  Proposition  2.3  may  be  extended  to  prove  that  for  A;  >  4  values  based 
on  (2.65)  satisfy  (2.23),  and  it  may  be  shown  that  YljLi  =  1  if  Aq  =  0.  For 
some  p  £  P,  the  values  suggested  by  (2.65)  are  valid  composition  probabilities. 
However,  for  some  p  £  P,  the  values  suggested  by  (2.65)  violate  (2.25). 

Example  4.  Let  Y  =  where  Yi,  i  =  1,2,  3, 4,  are  identical 

discrete  uniform  random  variables  so  that  ph  =  1.0  =  —pjj  for  all  *  <  J  <  4. 
Suppose  that  the  desired  correlation  matrix  is 


(  1  0  P13/4  P14/I6'' 

0  1  pJs/S  0 

P13/4  ^23/^  1  P34/8 

Vpr4/i6  0  P34/8  1  / 


(2.66) 


Suppose  that  Aq  =  0.  Then,  the  A^  values  suggested  by  (2.65)  are:  Ai  =  13/128; 
A2,A4  =  15/128;  A3  =  21/128;  A5  =  9/128;  Ag  =  11/128;  A7  =  25/128;  and  Ag  = 
19/128.  In  this  case,  the  application  of  (2.65)  yields  valid  composition  probabilities. 


□ 
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Example  5.  Recall  Example  4.  Suppose  that,  instead  of  Ri,  the  desired 
correlation  matrix  is  represented  by  a  correlation  point  nearer  to  an  extreme- 
correlation  point.  For  instance,  suppose  the  correlation  matrix  is 


/  1  0.9  0.9  0.9  \ 

0.9  1  0.9  0.9 

0.9  0.9  1  0.9 

\0.9  0.9  0.9  1 


{2.67) 


Let  Ao  =  0.  The  values  suggested  by  (2.65)  are:  Ai,  A4,  Ae,  A7  =  1/8;  A2,  A3,  A5  = 
—  1/10;  and  As  =  8/10.  Solving  an  LP  with  the  objective  of  minimizing  Ao  to  obtain 
a  feasible  A  for  a  Type  L  composite  distribution  yields:  Ai,A4,  AojAr  =  1/40; 
Ao,  A2,  A3,  As  =  0;  and  As  =  9/10.  □ 

The  following  generalization  of  Propositions  2.5  and  2.6  indicate  when  the 
formula  (2.65)  will  provide  valid  compostion  probabilities. 


Proposition  2.11  There  always  exists  a  Type  L  composite  distribution  for  p  = 
qo  G  P  for  which  Ao  =  0  whenever 


^  j  ^  /id.  4-  n. . 

E  E  (24  - 1)^^  <  1 


i=l  j=i+l 


Pij  Pij 


£  =  1,2,...,2 


fc-i 


(2.68) 


Proof:  Let  p  =  (0, 0, . . . ,  0)  be  the  -component  zero  vector  and  Ao  =  0.  Con¬ 
dition  (2.68)  ensures  that  A^  >  0,^  =  1, 2, . . . ,  2^~^ .  □ 


Proposition  2.12  If  there  is  a  Type  L  distribution  with  Ao  =  0  associated  with  the 
correlation  point  qo  G  P,  then  there  is  a  Type  L  distribution  with  Ao  =  0  associated 
with  every  p  E  P. 
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Table  2.5:  Upper  Bound  on  as  k  increases 


k 

Number  of 

PijS 

Number  of 

Maximum 

2 

1 

2 

1.0000 

3 

3 

4 

1.0000 

4 

6 

8 

0.8750 

5 

10 

16 

0.6875 

6 

15 

32 

0.5000 

Proof:  The  proof  of  this  proposition  is  virtually  identical  to  the  proof  of  Propo¬ 
sition  2.6.  □ 

The  formula  (2.65)  does  not  always  yield  valid  composition  probabiUties  be¬ 
cause,  for  k  >  3,  the  numerators  of  (2.65)  are  not  facets  of  P.  Further,  the  number 
of  extreme-correlation  points  q^,  2^~^,  grows  faster  with  k  than  with  the  number  of 
PijS,  (^2)  •  This  means  the  denominator  of  (2.65)  grows  faster  than  the  numerator, 
and  the  maximum  value  of  any  decreases  with  increasing  k  as  seen  in  Table  2.5. 
For  example,  when  k  >  A  and  p  =  ,  for  any  1  =  1,2,...,  2*^“^,  it  does  not  follow 

from  apphcation  of  (2.65)  that  A^  =  1  as  is  expected. 

Although  the  composition  probabilities  (2.26)  are  not  readily  extendable  to  dis¬ 
tributions  of  general  mrdtivariate  random  variables,  the  values  obtained  through 
apphcation  of  (2.65)  can  be  used  in  conjimction  with  a  composition  weight  adjust¬ 
ment  method  to  provide  valid  composition  probabihties  when  fc  =  4. 
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2.6  Composite  Distributions  for  Quadravariate  Random 
Variables 

In  this  section  composite  distributions  for  quadravariate  random  variables  are  con¬ 
structed  by  adjusting  the  composition  weights  obtained  from  (2.65).  The  adjust¬ 
ment  process  is  described  first  and  then  a  procedure  implementing  the  adjustment 
process  is  presented. 


2.6.1  Adjusting  Composition  Weight  Vectors 

Assume  for  A;  >  4  and  poprdation  correlation  structure  p  E  P,  that  a  vector 
of  composition  weights  A  from  (2.65)  is  not  nonnegative  (i.e.,  A  violates  (2.25)). 
One  way  to  obtain  a  vector  of  composition  probabilities  is  to  adjust  A  to  satisfy 

(2.23) -(2.25). 

Let  i*  =  argmax{A£},  and  j*  =  argmin{Af}.  Suppose  that  \j*  <  0  so  that 
(2.25)  is  violated.  Assume  that  Aq  will  not  be  changed.  One  can  partition 
{Ai,  A2, . . . ,  As}  by  defining  index  sets  L  and  R  such  that  the  indices  in  L  identify 
elements  of  A  that  decrease  in  value  while  indices  in  R  identify  elements  of  A  that 
increase  in  value.  Sets  L  and  R  effectively  partition  the  rows  in  Table  2.4  in  such 
a  way  that  for  any  column  in  the  table,  an  equal  number  of  Is  (and  thus  Os)  are 
in  L  and  R.  Clearly,  this  requires  \L\  =  |i?|  =  4.  Any  offsetting  adjustments  to 
A  based  on  the  index  sets  L  and  R  produce  an  alternative  A  that  still  satisfies 

(2.23) -(2.24).  To  ensrue  that  the  alternative  A  also  satisfies  (2.25),  it  must  be  that 
j*  E  R  and  that  each  adjustment  be  at  least  —Xj*. 
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Define  —  (^12  ,  for  each  vector  6^  and  define  the  degree  of  agreement 

of  6{.  as 

^(^1.)  =  E(1  -  K  -  ^  =  1, 2, . . . ,  8.  (2.69) 

i=2 

The  degree  of  agreement  function  (2.69)  provides  a  convenient  method  of  deter¬ 
mining  L  and  R: 

L  =  {i\Di6{.)  e  {0, 2},  z  =  1, 2, . . . ,  8},  (2.70) 

and 

R  =  {i\i  ^  L).  (2.71) 

Conveniently,  for  A;  =  4,  L  =  (Ai,  A4,  Xq,  X7)  and  R  =  (A2,  A3,  A5,  Ag),  or  vice  versa. 

Given  a  vector  A  from  (2.65)  that  violates  (2.25),  one  may  use  Procedure 
ADJUST  to  raise  Aj*  to  zero. 

Procedure  ADJUST 

1.  Let  i*  —  argmax£{A£},  j*  =  argmin^{Af},  and  e  =  —  A^.. 

2.  Define  L  and  R  according  to  (2.70)  and  (2.71). 

3.  For  £  =  1, 2, . . . ,  8,  do 

(a)  li  £  £  L  then  Xe  =  Xi  —  e 

(b)  If  G  72  then  A^  =  A^  J-  e 

4.  Return. 

In  some  cases,  the  adjusted  set  of  weights  are  the  same  as  would  be  obtained 
using  a  LP  to  find  a  Type  L  distribution. 

Example  6.  Recall  Example  5.  Procedme  ADJUST  returns:  Ai,  A4,  Ag,  A7  = 
1/40;  A2,  A3,  As  =  0;  and  Ag  =  9/10  with  Aq  =  0  based  on  e  =  1/10.  □ 

If  A  violates  (2.25)  after  ADJUST,  then  we  conclude  that  p  ^  P.  This  does 
not  imply  that  p  does  not  represent  a  valid  correlation  structure,  but  rather  that  p 
may  not  be  expressed  as  a  convex  combination  of  the  points  q^,  =  0, 1, . . . ,  2^^^. 
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Example  7.  Consider  normal  random  variables,  Yi,i  =  1,2,  3,4  and 

p  =  (0.959, 0.979,  -0.904, 0.951,  -0.819,  -0.879).  (2.72) 

The  determinant  of  the  corresponding  correlation  matrix  is  0.000492;  so  it  repre¬ 
sents  a  valid  correlation  structnre.  Using  (2.65)  yields:  71  =  —0.097625,  72  = 
0.100875,  73  =  0.129125,  74  =  -0.111875,  75  =  0.109125,  75  =  -0.101875, 
77  =  0.811345,  and  73  =  0.160875.  Then  argmin^{7^}  =  4,  2  G  7?,  and  4  G  L.  Since 
I74I  >  I72I,  ADJUST  does  not  return  a  valid  set  of  probability  weights.  Therefore, 
p  ^  P,  a  fact  easily  verified  using  a  LP.  □ 

A  very  similar  process  may  be  applied  to  the  construction  of  non- Type  L  com¬ 
posite  distributions,  i.e.,  composite  distributions  with  Aq  >  0.  Extending  formula 
(2.42)  to  the  quadravariate  case  gives: 


l-7o  +  EtinW,(2«f,-l) 


L  ^o)pii) 


7f  = 


^  =  1,2,...,  8,  (2.73) 


for  0  <  7o  <  7*.  The  formula  for  computing  7*  for  tri variate  random  variables 
given  in  §2.5  may  be  applied  to  each  of  the  trivariate  marginal  distributions  of  Y 
to  determine  7*  when  k  =  A. 

The  tri  variate  marginal  random  variables  for  Y  are  (Yi,!^,!^),  {Yi,Y2,Y4), 
(Yi,l3,lY),  and  {Y2,Y3,Y4).  Each  of  the  four  trivariate  marginal  distributions  for 
a  quadravariate  random  variable  is  a  valid  trivariate  distribution  for  three  of  the 
four  components  of  Y.  For  a  specified  p,  let  7^*^^^,  7(2),  7(3),  and  7(4^  be  the  respective 
7*  values  for  each  trivariate  marginal  random  variable  and 


7*  =  min{7(\),  7(2).  7(3).  7(4)  }• 


(2.74) 
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Let  0  <  7o  <  7*.  Whenever  (2.73)  returns  a  7  vector  that  violates  (2.25)  ap¬ 
ply  Procedure  ADJUST  to  adjust  the  vector  of  composition  weights  leaving  70 
unchanged. 

Example  8.  Let  Y  =  (Yi,  Y2,  ^j^),  where  each  Yi,i  =  1,2,3, 4,  is  negative 
exponential  with  arbitrary  means  and  let 


be  the  desired  correlation  matrix.  For  the  trivariate  margin  (Yi,  Y,  Fs  ),  7(*i)  =  0.45, 
for  (Yi,Y2,Y4),  7(*2)  =  0.60,  for  (Yi,Y3,Y4),  7(3)  =  0.0338,  and  for  (Y2,Y3,Y4), 
7(*4)  ~  0.37  so  that  7*  =  0.0338.  Using  70  =  0.0338,  application  of  (2.73)  5delds: 
7i  =  0.3746,  72  =  0.2383,  73  =  0.1319,  74  =  -0.0054,  75  =  0.1076,  76  =  0.0493, 
77  =  0.0645  and  73  =  0.0054.  Procedure  ADJUST  returns  70  =  0.0338,  71  = 
0.37996,  72  =  0.2329,  73  =  0.1265,  75  =  0.1022,  75  =  0.0547,  77  =  0.0699  and 
-^4,  ^8  =  0.0,  based  on  e  ==  0.0054.  These  results  agree  with  LP  results.  □ 

As  explained  and  demonstrated  before,  if  7  violates  (2.25)  after  ADJUST  is 
executed,  then  we  conclude  that  p  ^  P. 


2.6.2  Choosing  feasible  quadravariate  correlation  points 

It  is  useful  to  be  able  to  choose  feasible  correlation  structures  p  E  P  for  a 
quadravariate  random  variable.  A  distribution  for  a  quadravariate  random  vari¬ 
able  has  six  correlation  terms  and  each  term  is  associated  with  two  of  the  four 
trivariate  marginal  distributions.  The  value  specified  for  any  Pij,i  <  j  <  4,  must 
be  feasible  in  both  tri variate  marginal  distributions  involving  pij.  For  example. 
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any  value  for  p23  must  be  feasible  with  respect  to  the  trivariate  marginal  distri¬ 
butions  involving  both  {Yi,Y2,Y^)  and  (12,43,14).  Determining  the  feasible  range 
for  any  pij  involves  applying  (2.62)-(2.64)  to  the  appropriate  trivariate  marginal 
distributions. 

Example  9.  Let  Y  =  (yi,l2,  Ya,!^)  where  each  Yi  is  negative  exponential. 
Suppose  that  /?i2  =  0.6,  pi3  =  0.65,  and  /)i4  =  —0.25  have  been  specified.  The 
remaining  correlation  terms  must  satisfy 


0.25  < 

P23 

< 

0.95, 

(2.76) 

-0.6397  < 

P24 

< 

0.15, 

(2.77) 

-0.6  < 

P34 

< 

0.1 

(2.78) 

Ranges  (2.76)-(2.78)  are  necessary  but  not  sufficient  to  completely  specify  p.  Once 
either  of  p23,  P24,  or  P34  are  specified  the  remaining  two  ranges  must  be  recom¬ 
puted.  □ 

2.7  Summary  and  Discussion 

This  research  has  produced  a  characterization  of  composite  distributions  for  mul¬ 
tivariate  random  variables  with  a  specified  Pearson  product- moment  correlation 
structure  using  the  extreme-correlation  distributions  and  the  joint  distribution  rm- 
der  independence.  Type  L  and  Type  U  distributions  represent  special  types  of 
composite  distributions  with  extreme  levels  of  independent  sampling  and  they  de¬ 
fine  a  range  of  possible  composite  distributions  for  a  specified  correlation  structure. 
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Explicit  correlation  induction  based  on  composite  distributions  opens  up  many 
new  avenues  of  research,  both  theoretical  and  empirical,  concerning  multivariate 
sampling  and  the  influence  of  correlation  structure  in  many  types  of  systems.  In  fu¬ 
ture  theoretical  work,  closed-form  composition  probabilities  for  higher  dimensional 
random  variables  could  be  developed.  One  could  also  develop  composite  distribu¬ 
tions  based  on  other  measures  of  dependency,  such  as  Spearman  rank  correlation 
or  positive  regression  dependence.  There  are  many  empirical  research  opportuni¬ 
ties  too.  For  instance,  in  the  next  chapter,  two-dimensional  knapsack  problems  are 
generated  based  on  multivariate  composite  distributions  and  the  rank  correlation 
induction  method  of  Iman  and  Conover  (1982)  to  examine  the  influence  of  different 
types  of  correlation  structures  on  the  performance  of  solution  procedures.  Com¬ 
putational  experiments  should  also  be  conducted  on  other  types  of  optimization 
problems.  In  addition,  composite  distributions  could  be  used  in  the  simulation  of 
tandem  queueing  and  manufacturing  systems.  More  analytically  based  applica¬ 
tions  could  address  issues  in  the  design  of  experiments  or  variance  reduction  for 
simulation  applications  involving  multiple  random  number  streams  and  multiple 
measmes  of  performance. 
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Appendix  -  Type  L  and  Type  U  distributions  for  bivariate 
random  variables 


Type  L  Distributions 

Consider  the  following  LP  for  determining  mixing  probabilities  for  a  Type  L 
distribution  for  a  bivariate  random  variable: 

Min 


subject  to 


Z^Xo 

p  Ai  +  X2  — 

Aq  +  Ai  +  A2  =  1, 

-V)?  -^2  ^  0. 

A  basic  feasible  solution  to  (2.80)-(2.82)  with  Aq  =  0  is  given  by: 


Ai 

A2 


p  p-^ 
1  1 

1 

p-  -p+ 


n-i 


1 


1  P 
{P^  -  P^)/{P^  -  p-) 


I  >0, 


(2.79) 

(2.80) 
(2.81) 
(2.82) 


(2.83) 


V  (p” - P  )/(P^ ~P  )  . 

which  consists  of  the  composition  probabihties  given  in  §2.2.1.  The  complementary 
dual  solution  for  extreme  mixtures  is  dual  feasible  since 


(0, 0)- 


p-  -p+ 


1 

-1 


0  p  p+ 
1  1  1 


(0,0,0)  <(1,0,0).  (2.84) 


Therefore,  the  composition  probabilities  given  in  (2.83)  and  Aq  =  0  are  the  mixing 
probabilities  for  a  Type  L  composite  distribution. 
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Type  U  Distributions 

Now  consider  the  following  LP  for  determining  composition  probabilities  for  a 
Type  U  distribution  for  a  bivariate  random  variable: 

Max 


^  =  Ao 


(2.85) 


subject  to  (2.80)-(2.82).  Suppose  0  <  A  basic  feasible  solution  to  (2.80)- 

(2.82)  with  Ai  =  0  is 


0  p 
1  1 


n-l 


-l/p+  1 

1/P+  0 


( 1  -  pVp-^ 
V  pVp^ 


(2.86) 


The  complementary  dual  solution  is  dual  feasible  because 

(1’0)(  o)(l  'l'  f)  =  (0.^+ 1.0)  >(1.0.0).  (2.87) 

Therefore,  the  composition  probabihties  in  (2.86)  and  A  >  0  are  the  mixing  prob¬ 
abilities  for  a  Type  U  distribution  when  0  <  p*^  <  p"*".  A  similar  argument  may  be 
used  to  show  that  Aq  =  1  —  p^/p~,  Ai  =  p°/p“,  and  A2  =  0  is  an  optimal  solution 
to  (2.85),  (2.80)-(2.82)  when  p"  <  p"  <  0. 
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CHAPTER  III 

THE  EFFECTS  OF  COEFFICIENT 
CORRELATION  STRUCTURE  IN 
TWO-DIMENSIONAL  KNAPSACK 
PROBLEMS  ON  SOLUTION  PROCEDURE 

PERFORMANCE 


3.1  Introduction 

This  chapter  presents  an  empirical  study  that  examines  the  influence  of  correlation 
structme  between  the  coefficients  in  synthetic  optimization  problems  on  solution 
procedirre  performance.  One  reason  for  empirical  testing  of  solution  procedures 
is  to  overcome  the  limitations  inherent  in  deductive,  analytical  techniques  like 
worst-case  and  average-case  performance  analyses,  which  often  require  very  strong 
assumptions  to  ensure  mathematical  tractability  of  the  results.  Hooker  (1994)  sees 
the  ability  of  deductive  approaches  in  their  crurent  state  “inadequate  to  its  task,” 
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and  he  views  computational  testing  as  the  only  currently  viable  alternative.  Un¬ 
derstanding  the  influence  of  correlation  structrue  between  the  types  of  coefficients 
on  solution  procedure  performance  is  an  example  where  current  deductive  analysis 
methods  are  inadequate. 

In  many  empirical  studies  of  optimization  algorithms  or  heuristics,  randomly 
generated,  or  synthetic,  problems  are  assumed  to  be  representative  of  real-world 
problem  instances.  However,  defining  a  truly  representative  set  of  problems  is 
difficult.  The  usual  practice  is  to  systematically  vary  the  values  of  each  factor 
across  some  range,  and  thereby  include  a  variety  of  problem  instances  that  may 
include  instances  that  resemble  real  problem  instances.  Any  inferences  drawn 
for  the  entire  set  of  test  problems  are  assumed  to  apply  to  problem  instances 
encountered  in  practice. 

For  certain  classes  of  optimization  problems,  such  as  the  multidimensional 
knapsack  problem  (MKP),  and  in  particular  the  two-dimensional  knapsack  prob¬ 
lem  (2KP)  studied  here,  the  test  problem  coefficients  should  be  generated  by  sam¬ 
pling  from  joint  distributions  of  multivariate  random  variables.  In  this  study,  2KP 
coefficients  are  generated  based  on  a  variety  of  correlation  structmes.  In  addition, 
the  type  of  correlation  measure  (Pearson  product- moment  and  Spearman  rank 
correlation  measiires)  used  as  the  basis  for  generating  the  coefficients  is  varied. 
Solution  procedure  performance  results  are  then  examined  to  assess  how  correla¬ 
tion  structme  influences  the  performance  of  an  algorithm  and  a  heuristic. 

Some  computational  studies  have  been  conducted  on  test  problems  in  which 
correlation  is  induced  between  the  objective  function  and  constraint  coefficients. 
The  absolute  correlation  level  has  been  linked  to  performance  differences  for  solu- 
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tion  procediires.  By  generating  test  problems  based  on  a  multivariate  distribution, 
the  effect  of  the  correlation  between  the  coefficients  in  different  constraints,  as 
well  as  the  correlations  between  the  objective  and  constraint  coefficients,  may  be 
assessed. 

In  §3.2,  studies  involving  synthetic  optimization  test  problems  with  correla¬ 
tion  induced  among  the  coefficient  types  are  reviewed  and  past  research  involving 
the  MKP  is  summarized.  The  test  problem  generation  methodologies  used  in  the 
present  study  are  discussed  in  §3.3,  and  the  design  of  the  experiment  and  the 
analysis  methods  used  in  this  study  are  presented  in  §3.4.  Differences  in  sample 
distributions  due  to  the  correlation  induction  method  are  examined  in  §3.5.  The 
influence  of  each  type  of  correlation  measure  on  solution  procedure  performance 
is  discussed  in  §3.6.  Computational  results  for  CPLEX,  a  branch- and-boimd  pro¬ 
cedure,  are  discussed  in  §3.7,  while  §3.8  provides  a  similar  analysis  of  the  resrdts 
for  the  heuristic  by  Toyoda  (1975).  There  is  a  brief  discussion  in  §3.9  of  how  test 
problem  generation  parameters  influence  the  size  of  the  LP-IP  gap  in  the  synthetic 
test  problems.  Finally,  §3.10  contains  a  discussion  and  concluding  remarks. 


3.2  Background 

This  section  begins  with  an  introduction  to  MKP,  a  class  of  problems  in  which  2KP 
is  a  special  case.  A  review  of  previous  computational  studies  involving  synthetic 
optimization  problems  with  correlation  induced  among  the  coefficient  types  follows 
in  §3.2.2.  Results  of  some  previous  MKP  studies  are  summarized  in  §3.2.3. 


3.2.1  The  multidimensional  knapsack  problem 

MKP  is  a  0-1  programming  problem  of  the  following  form: 
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Maximize 


subject  to 


n 

z  =  ^2 

n 

Y^aijXj^bi  i  = 
i=i 

=  0  or  1  j  =  1, 2, . . . ,  n, 


(3.1) 


(3.2) 

(3.3) 


where  all  Cj  >  0  and  all  aij  >  0.  Additionally,  at  least  one  Uij  >  0  for  each  j. 
This  general  form  applies  to  a  wide  variety  of  optimization  applications,  including 
capital  budgeting  problems.  A  special  case  of  MKP  is  2KP,  where  m  =  2. 

MKP  is  known  to  be  NP-hard  (Frieze  and  Clarke,  1984),  meaning  that  there 
is  no  known  polynomial-time  solution  algorithm  for  MKP.  As  n  increases,  exact 
solution  methods,  such  as  branch-and-bound,  may  require  large  commitments  of 
computing  resources.  Consequently,  heuristics  are  often  used  to  find  solutions 
that  are  close  to  the  optimum  at  a  fraction  of  the  computational  cost  of  an  exact 
algorithm.  Much  of  the  recent  research  on  MKP  investigates  improved  heuristics. 


3.2.2  Empirical  studies  involving  problems  with  correlated 
coefficients 

Many  studies  have  examined  the  effect  of  correlation  between  objective  function 
and  constraint  coefficients  in  synthetic  optimization  problems  on  the  performance 
of  solution  procedrues.  A  common,  or  “legacy,”  aspect  of  these  studies  is  the  test 
problem  generation  methods  employed,  which  mimic  the  test  problem  generation 
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methods  used  in  earlier  studies,  such  as  those  by  Martello  and  Toth  (1979,  1981). 
Martello  and  Toth  (1979,  1988)  and  Balas  and  Zemel  (1980)  study  knapsack  solu¬ 
tion  procediures,  while  Potts  and  Van  Wassenhove  (1988,  1992)  and  John  (1989) 
study  solution  procedures  for  scheduling  problems.  Yet,  all  use  nearly  the  same 
test  problem  generation  method  and  all  report  significant  performance  degradation 
of  solution  methods  which  they  attribute  to  higher  positive  population  correlation 
between  objective  function  and  constraint  coefficients.  Martello  and  Toth  (1981), 
Fisher,  Jaikumar,  and  Van  Wassenhove  (1986),  Guignard  and  Rosenwein  (1989), 
Trick  (1992),  Mazzola  and  Neebe  (1993)  and  Amini  and  Racer  (1994)  report  wors¬ 
ening  solution  procedure  performance  due  to  stronger,  negative  population  corre¬ 
lation  levels  between  the  objective  function  and  capacity  constraint  coefficients  in 
the  generalized  assignment  problem  (GAP). 

Interestingly,  the  correlation  levels  induced  are  not  quantified  in  the  studies 
cited  above.  Common  parameter  settings  for  the  generation  methods,  such  as  those 
in  Martello  and  Toth  (1979),  induce  “weak”  population  correlation  above  0.97, 
while  other  settings,  such  as  those  in  Martello  and  Toth  (1981),  induce  population 
correlation  below  -0.97.  These  generation  methods  are  called  “implicit  correlation 
induction”  methods  by  Moore  and  Reilly  (1993)  because  the  correlation  levels 
induced  are  determined,  or  implied,  by  the  parameters  specified  for  the  problem 
generation  method.  Any  desired  variation  in  the  population  correlation  requires 
changing  either  the  parameter  settings  or  the  form  of  the  ruuvariate  marginal 
distributions. 


57 


The  correlation  level  between  two  types  of  coefficients  may  be  explicitly  induced 
and  varied  across  the  range  of  feasible  correlation  values.  Moore  and  Reilly  (1993) 
use  composition  to  induce  a  specified  population  correlation  level  between  ob¬ 
jective  function  coefficients  and  constraint  matrix  column  sums  in  weighted  set 
covering  problems.  Reilly  (1991)  and  Yang  (1994)  induce  correlation  in  the  0-1 
knapsack  problem  and  Pollock  (1992)  induces  correlation  in  the  weighted  set  cov¬ 
ering  problem  by  generating  coefficients  based  on  various  composite  probabihty 
mass  functions  (pmfs).  Each  of  these  four  studies  shows  that  increasing  positive 
correlation  between  the  objective  function  and  constraint  coefficients  degrades  so¬ 
lution  procedure  performance.  Cario  et  al.  (1995)  induce  various  correlation  levels 
between  objective  function  and  capacity  constraint  coefficients  in  the  GAP  and 
find  that  solution  performance  degrades  with  decreasing  correlation  between  the 
objective  function  and  constraint  coefficients.  In  addition,  Cario  et  al.  find  that 
GAP  instances  generated  imder  explicit  correlation  induction  are  more  challenging 
than  those  generated  under  implicit  correlation  induction. 

3.2.3  Some  empirical  studies  involving  MKP 

Table  3.1  summarizes  the  design  of  various  studies  of  the  performance  of  heuris¬ 
tics  for  MKP.  The  current  study  is  shown  for  comparison  prurposes.  Past  studies 
of  MKP  heuristics  indicate  that  problem  size,  the  distribution  of  the  constraint 
coefficients,  and  the  method  used  to  determine  the  right-hand  side  coefficients  (or 
constraint  slackness)  influence  heuristic  performance. 
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Table  3.1;  Factors  and  Measures  Used  in  Previous  Empirical  Studies  of  MKP 
Heuristics 


Study  Authors 

Problems 

Generated 

Factors 

Measures 

m 

n 

s 

D 

s 

Tm 

Err  OpS  Iter 

Toyoda  (1975) 

904 

0 

o 

0 

0 

Loulou  &  Michaelides  (1979) 

2250 

o 

0 

0 

0 

0 

Balas  &  Martin  (1980) 

41 

o 

o 

0 

0 

0 

Pirkul  (1987) 

230 

0 

o 

0 

0 

0 

0 

Zanakis  (1977) 

135 

0 

0 

0 

0 

0 

Freville  &  Plateau(1993) 

Lots 

o 

o 

0 

0 

0 

0 

Freville  &  Plateau(1994) 

610 

o 

0 

0 

0 

o 

o 

0 

Current  study 

2240 

0 

0 

0  0 

m  =  number  of  constraints 
n  =  number  of  decision  variables 
S  =  slackness  of  constraints 
D  =  distribution  of  constraint  coefGcients 

S  =  population  correlation  induced  between  problem  coefficients 
Tm  =  CPU  time  required 

Err  =  measure  of  relative  error  between  heuristic  and  optimal  solution  value 
OpS  =  number  of  problems  solved  to  optimality 
Iter  =  number  of  iterations 


Pirkul  (1987)  and  Balas  and  Martin  (1980)  implicitly  induce  population  corre¬ 
lation  between  the  objective  function  and  constraint  coefficients  for  each  variable 
of  approximately  0.66.  In  addition,  they  induce  correlations  of  about  0.43  be¬ 
tween  the  coefficients  in  every  pair  of  constraints.  Generally,  the  performance  of 
the  heuristics  worsens  as  the  objective  function-constraint  correlations  increase. 
However,  there  is  no  discussion  of  how  their  results  might  be  influenced  by  the 
interconstraint  correlations.  Freville  &  Plateau(1994)  generate  objective  fimction 
coefficients  and  right-hand  side  values  as  functions  of  independently  generated  con¬ 
straint  coefficients.  They  conclude  that  independent  problems  are  easier  to  solve 
than  the  problems  with  correlation. 
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3.3  The  Test  Problem  Generation  Methods 

Two  multivariate  correlation  induction  methods,  each  associated  with  a  different 
correlation  measure,  are  used  in  this  study.  Both  methods  require  the  user  to 
specify  the  univariate  marginal  distributions  and  the  correlation  structure.  The 
composition  method  discussed  in  HiU  (Chapter  2)  is  used  to  induce  specified  Pear¬ 
son  product-moment  population  correlation  structures,  and  the  method  presented 
in  Iman  and  Conover  (1982)  is  used  to  approximately  induce  specified  sample 
Spearman  rank  correlation  structures.  Both  methods  are  used  to  generate  values 
of  trivariate  random  variables  to  represent  the  coefficients  (cj,aij,a2j)  for  each 
variable  Xj  in  2KP. 


3.3.1  Pearson  product-moment-based  correlation  induction 
method 

The  Pearson  product-moment  correlation  coefficient  is  a  measure  of  the  linear  de¬ 
pendence  between  two  random  variables.  Hill  (Chapter  2)  shows  how  to  construct  a 
multivariate  composite  distribution  based  on  a  specified  Pearson  product-moment 
population  correlation  structure,  using  the  extreme- correlation  distributions 
and  the  joint  distribution  under  independence.  For  a  fc-variate  random  variable, 
Y,  each  extreme- correlation  distribution  is  a  joint  distribution  for  which  each  cor¬ 
relation  term  is  at  either  the  extreme  positive  or  extreme  negative  level.  The 
extreme-correlation  distributions  are  denoted  hi{y),i  =  1, 2, . . . ,  2*'“^,  and 
the  joint  distribution  imder  independence  is  denoted  ho{y). 
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Chapter  2  provides  formulas  for  computing  composition  probabilities,  Xe^i  = 
0, 1,2, 3,4,  based  on  a  specified  correlation  structure  for  a  trivariate  random  vari¬ 
able.  A  joint  distribution  with  the  specified  correlation  structure  is  the  composite 
distribution 

Hy)  =  '^^ehe{y).  (3.4) 

£=0 

The  value  of  A^,  .^  =  0, 1, 2, 3, 4,  represents  the  relative  frequency  of  sampHng  based 
on  hi{y).  When  Aq  is  at  its  minimum  value,  the  composite  distribution  is  called  a 
Type  L  distribution.  When  Aq  is  at  its  maximum  value,  the  composite  distribution 
is  called  a  Type  U  distribution.  The  Type  L  and  Type  U  distributions  define  a 
range  of  composite  distributions  with  a  specified  population  correlation  structure. 


3.3.2  Spearman  rank  correlation-based  correlation  induc¬ 
tion  method 

The  Spearman  rank  correlation  coefficient  is  a  measure  of  the  monotonic  depen¬ 
dency  between  two  random  variables.  Let  M  be  a  specified  correlation  matrix. 
The  method  of  Iman  and  Conover  (1982)  may  be  used  to  induce  a  Spearman  rank 
correlation  structure,  given  by  M,  among  a  set  of  random  variables. 

Suppose  n  observations  of  k  random  variables  with  correlation  structure  M 
is  required.  First  generate  two  matrices,  R  and  V  such  that  R  is  an  {n  x  k) 
matrix  of  van  der  Waerden  scores,  randomized  within  each  of  the  k  columns,  and 
V  is  an  (n  x  k)  matrix  of  n  independent  observations  of  each  of  the  k  random 
variables.  Consider  each  column  of  R  as  n  observations  of  k  random  variables 
and  compute  T,  the  corresponding  sample  rank  correlation  matrix.  Compute  the 
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Choleski  factorizations  A  and  Q  such  that  T  =  AA'  and  M  =  QQ'.  Compute 

S  =  R(AQ-^)',  (3.5) 

which  is  a  transformed  matrix  of  scores.  The  k  columns  of  n  values  in  S  have 
a  sample  rank  correlation  structm-e  that  approximates  M.  The  entries  in  each 
column  of  V  are  reordered  so  that  their  rankings  are  the  same  as  the  rankings  in 
the  corresponding  columns  of  S.  The  sample  Spearman  rank  correlation  structure 
of  the  shuffled  matrix  of  observations,  V,  approximates  the  specified  correlation 
structme,  M. 

This  method  is  applicable  for  any  marginal  distributions.  However,  since  this 
method  involves  computing  Choleski  factorizations,  matrix  inverses,  and  ranking  of 
the  data,  the  method  becomes  more  computationally  intensive  as  fc  or  n  increases. 


3.4  The  Experiment  Design  and  Analysis  Methods 

The  goal  of  this  study  is  to  gain  a  deeper  rmderstanding  of  how  test  problem  gen¬ 
eration  methods  influence  the  performance  of  solution  procedrues.  This  is  not  an 
advocacy  study  for  a  particular  solution  procedure  or  an  experiment  on  state-of- 
the-art  solution  methods  for  MKP  (2KP).  Rather,  this  is  an  investigation  into  how 
the  performance  of  representative  techniques,  the  branch-and-bormd  code  CPLEX 
and  the  heuristic  by  Toyoda  (1975),  are  affected  by  the  correlation  structure  be¬ 
tween  the  coefficient  types,  and  by  other  test  problem  characteristics. 

Let  ~  [/{1, 2, . . .  ,40}  be  the  random  variable  representing  the  values  of 
the  coefficients  in  the  first  constraint,  ~  C/{1,  2, . . . ,  15}  be  the  random  vari¬ 
able  representing  the  values  of  the  coefficients  in  the  second  constraint,  and  C  ~ 
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,  100}  be  the  random  variable  representing  the  values  of  the  objective 
function  coefficients  in  2KP.  Different  distributions  for  and  virtually  guaran¬ 
tee  that  both  constraints  in  each  2KP  instance  are  different.  Suppose  the  distribu¬ 
tions  of  and  are  identical.  Then  G  [—1, 1],  and  for  each  2KP  instance 
generated  with  Pa^a'^  =  Ij  the  coefficients  for  each  variable  would  be  identical  in 
both  constraints. 

The  three  correlation  terms  in  the  correlation  structure  of  2KP  are 
PcA'^y  and  Pa^a"^  with  p  =  {PcA^yPcA'^yPA^A'^)-  When  referring  to  a  particu¬ 
lar  correlation  measure,  P  =  {PcA^iPcA^iP^^A^)  denotes  the  Pearson  product- 
moment  correlation  structme  while  a  Spearman  rank  correlation  structure  is  de¬ 
noted  S  =  {pcA^yPcA^yP%A-^)- 

3.4.1  Definition  of  the  experiment  design  settings 

Three  problem  generation  parameters  are  varied  in  this  experiment:  the  correlation 
structure  between  the  sets  of  problem  coefficients,  the  constraint  slackness,  and  the 
correlation  measme  (Pearson  or  Spearman).  It  is  well  established  that  problem 
size  influences  solution  procedure  performance,  so  problem  size  is  held  constant 
with  two  constraints  (i.e.,  the  2KP)  and  100  variables. 

For  the  marginal  distributions  previously  defined  for  (7,  A^,  and  A?,  the  ranges 
of  feasible  Pearson  correlation  levels  for  each  correlation  term  are: 


PCA^  G  [-0.99997, 0.99997] 

(3.6) 

PcA^  G  [-0.99773, 0.99773] 

(3.7) 

and  p%A2  G  [-0.99752,0.99752]. 

(3.8) 
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The  population  correlation  structures  are  varied  by  systematically  varying  each 
correlation  term.  Five  equally-spaced  correlation  values  across  the  feasible  range 
for  each  correlation  term  yields  125  potential  correlation  structures.  However,  80 
of  these  combinations  yield  would-be  correlation  matrices  that  are  not  positive 
semi-definite.  Table  3.2  lists  the  45  feasible  correlation  structures;  Figure  3.1 
is  a  3-dimensional  plot  of  the  feasible  correlation  structures.  For  each  feasible 
correlation  structure,  there  is  a  composite  joint  distribution  (3.4).  For  the  Pearson 
correlation  induction  method,  the  joint  distribution  for  34  of  these  45  feasible 
correlation  structmes  is  expressible  only  as  a  Type  L  composite  distribution  with 
Aq  =  0.  For  each  correlation  structure  in  Table  3.2  marked  with  a  •,  there  are 
composite  distributions  with  Aq  >  0  in  addition  to  a  Type  L  distribution.  For 
these  correlation  structmes,  both  the  Type  L  and  Type  U  forms  of  the  composite 
distribution  are  used  in  the  experiment. 

A  “slackness”  measure  for  constraint  i,  Si,  is  defined  as  the  ratio  of  the  right- 
hand  side  coefficient  in  constraint  i  to  the  sum  of  the  coefficients  in  that  constraint. 
Low  slackness  values  give  “tight”  constraints  and  high  slackness  values  give  “loose” 
constraints.  Constraints  are  “mixed”  if  both  low  and  high  slackness  values  are 
specified  for  the  same  test  problem.  Table  3.3  summarizes  the  slackness  levels 
used  in  the  studies  cited  previously.  Two  levels  of  slackness  are  examined  in  this 
study;  Si  =  0.30,0.70,  i  =  1,2.  Each  of  the  four  possible  settings  of  Si  and  S2 
is  referred  to  as  a  constraint  slackness  setting.  Since  the  marginal  distribution 
of  differs  from  the  marginal  distribution  of  A^,  then  {Si,S2)  =  (0.30,0.70)  is 
considered  to  be  a  different  slackness  setting  than  {Si,S2)  =  (0.70,0.30). 
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Table  3.2:  Experiment  Design  Correlation  Structures 


Number 

Correlation  Values 

PcA^  PcA^  Pa^a‘^ 

Number 

Correlation  Values 

PcA^  PcA^  Pa^A^ 

1 

0.99997 

0.99773 

0.99752 

24 

• 

-0.49999 

0.00000 

0.00000 

2 

0.49999 

0.49887 

0.99752 

25 

-0.99997 

0.00000 

0.00000 

3 

0.00000 

0.00000 

0.99752 

26 

0.49999 

-0.49887 

0.00000 

4 

-0.49999 

-0.49887 

0.99752 

27 

• 

0.00000 

-0.49887 

0.00000 

5 

-0.99997 

-0.99773 

0.99752 

28 

-0.49999 

-0.49887 

0.00000 

6 

0.49999 

0.99773 

0.49876 

29 

0.00000 

-0.99773 

0.00000 

7 

0.99997 

0.49887 

0.49876 

30 

-0.49999 

0.99773 

-0.49876 

8 

• 

0.49999 

0.49887 

0.49876 

31 

0.00000 

0.49887 

-0.49876 

9 

0.00000 

0.49887 

0.49876 

32 

• 

-0.49999 

0.49887 

-0.49876 

10 

0.49999 

0.00000 

0.49876 

33 

-0.99997 

0.49887 

-0.49876 

11 

• 

0.00000 

0.00000 

0.49876 

34 

0.49999 

0.00000 

-0.49876 

12 

-0.49999 

0.00000 

0.49876 

35 

• 

0.00000 

0.00000 

-0.49876 

13 

0.00000 

-0.49887 

0.49876 

36 

-0.49999 

0.00000 

-0.49876 

14 

• 

-0.49999 

-0.49887 

0.49876 

37 

0.99997 

-0.49887 

-0.49876 

15 

-0.99997 

-0.49887 

0.49876 

38 

• 

0.49999 

-0.49887 

-0.49876 

16 

-0.49999 

-0.99773 

0.49876 

39 

0.00000 

-0.49887 

-0.49876 

17 

0.00000 

0.99773 

0.00000 

40 

0.49999 

-0.99773 

-0.49876 

18 

0.49999 

0.49887 

0.00000 

41 

-0.99997 

0.99773 

-0.99752 

19 

• 

0.00000 

0.49887 

0.00000 

42 

-0.49999 

0.49887 

-0.99752 

20 

-0.49999 

0.49887 

0.00000 

43 

0.00000 

0.00000 

-0.99752 

21 

0.99997 

0.00000 

0.00000 

44 

0.49999 

-0.49887 

-0.99752 

22 

• 

0.49999 

0.00000 

0.00000 

45 

0.99997 

-0.99773 

-0.99752 

23 

• 

0.00000 

0.00000 

0.00000 

Table  3.3:  Slackness  Settings  From  Previous  MKP  studies 


Study  Authors 

Slackness  Setting  Si 

Toyoda 

Si  =  0.67 

Loulou  &  Michaelides 

fej  =  1  V  i  =  1, 2, . . . ,  m 

Balas  &  Martin 

Si  ~  17(0.5,0.9) 

Pirkul 

Si  =  0.50 

Zanakis 

Si  =  0.30,  0.50,  0.90 

Freville  &;  Plateau(1993) 

Si  =  0.25,  0.50,  0.75 

Freville  &  Plateau(1994) 

Si  =  0.25,  0.50,  0.75 

Current  study 

Si  =  0.30,  0.70 
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Figure  3.1:  Three  Dimensional  Plot  of  Experiment  Design  Correlation  Structures 

One  piurpose  for  this  study  is  to  examine  how  the  correlation  measme  (Pearson 
or  Spearman)  used  affects  solution  procedure  performance.  For  each  specified  cor¬ 
relation  structure  and  constraint  slackness  setting  combination,  five  test  problems 
are  generated  using  both  the  Pearson  correlation  induction  method  (i.e.,  compo¬ 
sition)  and  the  Spearman  correlation  induction  method  (i.e.,  Iman  and  Conover’s 
method).  Random  numbers  are  not  synchronized  since  the  shuffling  involved  in 
Iman  and  Conover’s  method  would  undermine  any  synchronization. 

Each  combination  of  correlation  structm:e,  constraint  slackness  setting,  and  cor¬ 
relation  measure  forms  an  experiment  design  point.  With  45  correlation  structures, 
fom  constraint  slackness  settings,  two  correlation  measures,  and  five  replications 
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for  each  design  point,  1800  optimization  test  problems  are  generated.  Additionally, 
the  11  correlation  structures  in  Table  3.2  that  may  be  represented  by  a  Type  U 
composite  distribution  provide  an  additional  440  test  problems,  for  a  total  of  2240 
2KP  test  problems  generated  for  this  study. 

A  representative  algorithm  and  heuristic  were  chosen  based  on  availability  and 
general  acceptance  of  the  procedures.  CPLEX  from  CPLEX  Optimization,  Inc., 
is  contained  in  many  commercially  available  packages  and  is  available  as  a  stan¬ 
dalone  product  (see  review  by  Saltzman,  1994).  The  mixed- integer  optimizer  in 
CPLEX,  version  2.1,  was  selected  and  utilized  in  a  depth-first  search,  branch- 
and-bound  mode.  Many  of  the  studies  involving  heuristics  for  MKP  benchmark  to 
Toyoda’s  (1975)  heuristic.  Hence,  Toyoda’s  heuristic  was  chosen  as  a  representative 
heuristic.  Hereafter,  these  procedures  are  referred  to  as  CPLEX  and  TOYODA, 
respectively. 

There  are  a  variety  of  performance  measures  for  assessing  solution  procedure 
effectiveness  and  efficiency.  Typical  performance  measures  for  branch-and-bound 
procedures  include  CPU  time,  iteration  coimt,  or  the  number  of  nodes  enumerated 
in  the  branch-and-bound  tree.  The  three  measures  are  clearly  related  to  one  an¬ 
other.  The  number  of  nodes  is  used  in  this  study  and  is  referred  to  as  NODES. 
Typical  measures  for  heuristics  include  CPU  time,  iteration  connt,  or  relative  error. 
This  study  uses  the  relative  error  denoted  as  REL,  where 

REL  =  100  X  ^ 

Zip 

where  Zh  is  the  heruistic  solution  value  and  Zjp  is  the  optimal  (or  best  known) 
integer  solution  value  for  the  2KP. 
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The  size  of  the  LP-IP  gap  in  an  optimization  problem  is  often  viewed  as  a 
factor  influencing  the  performance  of  solution  procedures  (Chang  and  Shepardson, 
1982).  In  this  study,  the  influence  of  the  factors  in  the  experiment  on  the  size  of 
the  LP-IP  gap  is  briefly  examined. 


3.4.2  Methods  for  analyzing  results 

Two  non-parametric  statistical  tests,  the  sign  test  and  the  Kruskal-Walhs  (KW) 
test,  are  used  to  analyze  the  data  from  the  experiment.  These  tests  are  summarized 
below;  additional  details  are  found  in  Conover  (1980). 

The  sign  test  is  useful  for  establishing  whether  observations  from  one  popu¬ 
lation  tend  to  differ  in  magnitude  when  compared  to  observations  from  another 
population.  Let  and  Xj^\i  =  1,2,  ...,n,  be  n  observations  from  two  pop¬ 
ulations  paired  in  some  logical  fashion,  and  let  di  =  —  Xj^\i  =  1, 2, . . . ,  n, 

be  the  differences  between  the  observations.  If  there  is  no  difference  in  magnitude 
between  the  populations,  the  probabihty  of  a  positive  sign  on  each  di  follows  the 
binomial  distribution  with  p  =  0.5.  Therefore,  the  null  and  alternative  hypotheses 
are: 


ffo:Pr(+)  =  Pr(-) 
Hi:Pr(+)  ^  Pr(-) 


where  Pr(-b)  and  Pr(— )  are,  respectively,  the  probabilities  of  positive  and  negative 
signs  on  each  dj.  The  test  statistic,  Ti,  is  the  total  number  of  positive  djS,  ignoring 
ties.  The  decision  rule  is  to  reject  iLo  at  the  a  level  of  significance  if  the  probability 


68 


of  observing  Ti  positive  diS  under  a  true  null  hypothesis  is  less  than  a.  The  primary 
use  of  the  sign  test  in  this  study  is  to  test  whether  there  is  a  difference  in  solution 
procedure  performance  due  to  the  correlation  measure. 

The  KW  test  is  a  rank  test  for  differences  among  the  means  in  m  poprda- 
tions.  Let  =  1,2,. ..  ,nj,j  =  1,2,  be  the  observation  from  the 

population  and  R{X^>^), i  —  1,2,. ..  ,nj,j  =  l,2,...,m,  be  the  overall  rank  of 
each  observation  among  all  N  =  observations.  The  null  and  alternative 

hypotheses  are: 


Hq  :  All  m  population  distribution  functions  have  identical  means, 

Hi  :  The  m  population  distribution  functions  do  not  have  identical  means. 

Define  Rj  =  53”=! j  —  1,2,  The  test  statistic  T2  for  the  KW 

test  is 

1  mN+i)‘\ 


55 1  ^7^7  — 


where 


o2  1  N{N  +  1)‘^ 


(3.11) 


The  decision  rule  is  to  reject  Hq  at  the  a  level  of  significance  if  T2  exceeds  the 
1  —  a  quantile  of  the  chi-square  distribution  with  m  —  1  degrees  of  freedom.  The 
KW  test  is  used  in  this  study  to  test  whether  or  not  there  are  solution  procedure 
performance  differences  due  to  either  the  correlation  structure  or  the  constraint 
slackness  setting. 
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In  addition  to  the  sign  and  KW  tests,  regression  models  are  constructed  to 
quantify  the  relationships  between  the  experiment  design  parameters  and  each 
performance  measure.  The  models  constructed  based  on  the  2KP  experiment  were 
developed  using  a  stepwise  regression  procedure  to  obtain  the  regression  model 
that  maximizes  the  value  of  the  coefficient  of  determination,  R^. 


3.5  Comparing  Samples  Prom  The  Correlation  Induction 
Methods 

One  motivation  for  this  study  is  to  examine  whether  solution  procedmre  perfor¬ 
mance  is  affected  by  differences  in  the  form  of  the  imderlying  multivariate  distri¬ 
bution  of  coefficient  values.  A  simple  experiment  with  a  bivariate  random  vari¬ 
able  provides  some  insight  into  the  differences  in  the  form  of  the  underlying  dis¬ 
tribution  associated  with  each  correlation  measure.  (The  correlation  structure 
also  affects  the  form  of  a  joint  distribution,  even  for  a  fixed  correlation  mea¬ 
sure.)  Assume  that  Yj  and  Y2  are  each  discrete  uniform  random  variables  where 
Yi  ~  [/(1, 2, . . . ,  20),  Ya  ~  U{1, 2, . . . ,  10),  and  Corr(Yi,  Ya)  -  0.49876.  The  sam¬ 
ple  joint  distributions  that  result  from  100,000  observations  from  the  generation 
method  for  each  correlation  measme  are  shown  in  Figiues  3.2  and  3.3.  To  simplify 
the  generation  of  data  based  on  the  Spearman  measure,  the  100,000  observations 
came  from  1000  replications  of  100  observations  each. 

Figure  3.2  contains  a  sample  pmf  (multiphed  by  10000)  for  Pearson  product- 
moment  correlation  induction  using  a  Type  U  distribution.  There  is  a  minimal 
probabihty  in  each  cell  and  a  concentration  of  probability  along  the  upper  left  to 
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lower  right  “diagonal”  of  cells.  This  concentration  of  probability  is  characteristic 
of  compositions  involving  extreme- correlation  distributions  (Devroye,  1986).  The 
concentration  of  probability  along  the  diagonal  is  minimized  (maximized)  and  the 
minimum  probability  in  each  cell  is  maximized  (minimized)  with  Type  U  (Type  L) 
distributions  (Hill  and  Reilly,  1994).  Suppose  100,000  observations  of  {Yi^Y2)  were 
generated  based  on  a  Type  L  distribution.  Then,  aU  of  the  observations  would 
be  concentrated  on  the  upper  left  to  lower  right  and  lower  left  to  upper  right 
diagonals.  The  cells  off  these  diagonals  would  have  probability  zero. 

Figure  3.3  contains  a  sample  pmf  (multiplied  by  10000)  for  Spearman  rank  cor¬ 
relation  induction.  There  is  a  wide  band  of  probability  along  the  main  “diagonal” 
discussed  for  Figme  3.2;  several  cells  have  very  small  probability.  Generally,  the 
probability  in  the  cells  drops  off  as  one  looks  at  cells  further  and  further  away  from 
the  main  diagonal. 

For  each  of  the  2240  problems  generated  for  this  study  sample  Pearson  product- 
moment  and  Spearman  rank  correlation  values  were  computed.  Table  3.4  summa¬ 
rizes  these  sample  correlations  by  correlation  induction  method  and  by  the  value 
specified  for  each  correlation  term.  The  data  in  Table  3.4  indicates  that  both  cor¬ 
relation  induction  methods  effectively  generate  data  with  the  specified  correlation 
structrue.  The  standard  errors  of  the  average  sample  correlation  values  rmder  the 
Spearman  induction  method  are  generally  smaller  than  the  corresponding  standard 
errors  under  the  Pearson  induction  method.  This  phenomenon  may  be  explained 
by  recognizing  the  different  approach  to  inducing  correlation  that  these  two  variate- 
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Figure  3.2:  Sample  Distributional  Form  From  Pearson  Induction  Method, 
p  =  0.49876 
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Figure  3.3:  Sample  Distributional  Form  From  Spearman  Induction  Method, 
p  =  0.49876 
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Table  3.4:  Sample  Correlations  by  Target  and  Induction  Type 


Target 

Number 

Induction  Method 

Correlation 

of 

Pearson 

Spearman 

Value 

Problems 

Mean 

Std  Error 

Mean 

Std  Error 

0.99997 

100 

0.98968 

0.00007 

0.99914 

0.00002 

0.49999 

280 

0.49524 

0.00645 

0.47747 

0.00102 

PCA^ 

0.00000 

360 

-0.00328 

0.00668 

0.00068 

0.00078 

-0.49999 

280 

-0.49764 

0.00656 

-0.47494 

0.00107 

-0.99997 

100 

-0.98967 

0.00005 

-0.99737 

0.00004 

0.99773 

100 

0.98780 

0.00003 

0.99436 

0.00009 

0.49887 

280 

0.48964 

0.00658 

0.47724 

0.00123 

PCA^ 

0.00000 

360 

-0.01640 

0.00679 

0.00350 

0.00092 

-0.49887 

280 

-0.49967 

0.00625 

-0.47052 

0.00122 

-0.99773 

100 

-0.98775 

0.00003 

-0.98762 

0.00014 

0.99752 

100 

0.98756 

0.00003 

0.99318 

0.00010 

0.49876 

280 

0.48105 

0.00614 

0.47468 

0.00140 

Pa^a'^ 

0.00000 

360 

-0.00254 

0.00693 

0.00515 

0.00110 

-0.49876 

280 

-0.49975 

0.00626 

-0.46844 

0.00138 

-0.99752 

100 

-0.98762 

0.00003 

-0.98564 

0.00017 

generation  methods  use.  The  Pearson  induction  method  samples  from  a  composite 
distribution  with  a  specified  Pearson  product- moment  population  correlation  struc¬ 
ture,  while  the  Spearman  induction  method  (Iman  and  Conover,  1982)  targets  a 
specified  sample  rank  correlation  structure. 

Table  3.5  summarizes  Spearman  sample  correlations  for  the  2KP  coefficients 
generated  with  the  Pearson  method,  and  vice  versa.  Consider  the  mean  values 
reported  in  Table  3.5.  The  Pearson  method  effectively  generates  data  with  the 
specified  Spearman  rank  correlation  structure.  However,  the  Spearman  method 
is  less  effective  at  generating  data  with  a  specified  Pearson  product- moment  cor¬ 
relation  structure.  In  both  cases,  standard  errors  are  small,  and  better  for  the 
problems  generated  with  the  Spearman  method. 
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Table  3.5:  Sample  Correlations  by  Target,  Method,  and  Alternate  Measure 


Target 

Number 

Pearson  Method 

Spearman  Method 

Correlation 

of 

Spearman  Value 

Pearson  Value 

Value 

Problems 

Mean 

Std  Error 

Mean 

Std  Error 

0.99997 

100 

0.99943 

0.00010 

0.98218 

0.00046 

0.49999 

280 

0.50093 

0.00651 

0.46857 

0.00120 

PCA^ 

0.00000 

360 

-0.00311 

0.00669 

-0.00047 

0.00100 

-0.49999 

280 

-0.50085 

0.00656 

-0.46815 

0.00130 

-0.99997 

100 

-0.99773 

0.00008 

-0.98219 

0.00035 

0.99773 

100 

0.99698 

0.00005 

0.97821 

0.00047 

0.49887 

280 

0.49608 

0.00667 

0.46831 

0.00140 

PCA'2 

0.00000 

360 

-0.01349 

0.00680 

-0.00002 

0.00108 

-0.49887 

280 

-0.49948 

0.00624 

-0.46805 

0.00138 

-0.99773 

100 

-0.99039 

0.00011 

-0.97792 

0.00053 

0.99752 

100 

0.99692 

0.00003 

0.97691 

0.00050 

0.49876 

280 

0.48805 

0.00618 

0.46546 

0.00146 

Pa^a^ 

0.00000 

360 

0.00172 

0.00690 

0.00135 

0.00119 

-0.49876 

280 

-0.49782 

0.00629 

-0.46719 

0.00149 

-0.99752 

100 

-0.98949 

0.00011 

-0.97687 

0.00047 
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Table  3.6:  Performance  Measure  Averages  by  Correlation  Measure 


Performance 

Measure 

Mean  Performance  Measure 

Pearson 

Spearman 

NODES 

REL 

2337.50 

0.77 

3748.83 

1.50 

Table  3.7:  Results  of  Sign  Test  on  Performance  Measures 


Performance 

Total 

Total 

Acceptance 

Measure 

di^O 

di  >  0 

Region 

p- value 

NODES 

nil 

354 

(522,588) 

<  0.0001 

REL 

1095 

759 

(515,580) 

<  0.0001 

3.6  Influence  of  Population  Correlation  Measure 

Table  3.6  summarizes  the  results  for  each  performance  meastire  by  population  cor¬ 
relation  measure.  The  test  problems  generated  based  on  Spearman  rank  correlation 
require  more  branch- and- bound  nodes  with  CPLEX  and  have  larger  relative  errors 
with  the  TOYODA  heuristic  than  the  problems  generated  based  on  the  Pearson 
correlation  induction  method. 

Suppose  the  data  are  separated  by  correlation  measure  (i.e.,  by  induction 
method)  and  then  paired  by  design  point  and  replication  number  to  develop  the 
vector  of  differences  used  in  a  sign  test.  Table  3.7  provides  the  sign  test  results 
for  each  performance  measure  and  the  a  =  0.05  acceptance  regions.  These  results 
indicate  that  test  problems  based  on  the  Spearman  correlation  measure  require 
more  NODES  by  CPLEX  and  have  a  larger  REL  for  TOYODA. 
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A  sign  test  was  applied  to  the  data  associated  with  each  of  the  45  population 
correlation  structures  hsted  in  Table  3.2.  Table  3.8  provides  the  p- values  associated 
with  each  of  these  45  tests.  An  a 

below  0.05,  28  tests  for  NODES  data  and  21  for  REL  data.  For  all  tests  significant 
at  the  a  =  0.05  level,  problems  based  on  the  Spearman  induction  method  required 
either  more  NODES  or  REL  was  higher.  Within  the  REL  column  of  Table  3.8  the 
*s  tend  to  be  associated  with  pcA^  and  pcA^  values  above  0.4,  while  there  is  no 
such  obvious  pattern  to  the  *s  within  the  NODES  column. 

Table  3.9  provides  sign  test  results  by  Type  L  and  Type  U  distributions  for  the 
eleven  correlation  structures  permitting  both  types  of  composite  distributions.  An 
asterisk  {*)  indicates  those  cases  where  there  is  a  significant  difference  in  perfor¬ 
mance  between  problems  generated  based  on  the  Pearson  measure  and  problems 
based  on  the  Spearman  measiire  for  an  a  =  0.05  significance  level.  Test  prob¬ 
lems  based  on  the  Pearson  measure  with  a  Type  L  distribution  are  more  likely 
to  produce  results  different  from  their  Spearman  measrrre  counterparts  (i.e.,  less 
NODES,  larger  REL  value)  than  those  test  problems  with  a  Type  U  distribution. 
This  phenomenom  is  observed  because  a  Type  U  distribution  more  closely  resem¬ 
bles  the  imderlying  distribution  for  the  Spearman  correlation  induction  method 
than  does  a  Type  L  distribution. 


sterisk  (*)  highlights  those  tests  with  p-values 


Table  3.8:  Sign  Test  Results  On  Each  Correlation  Structure 


Correlation  Values 

PCA^  PCA^  pA'^A^ 

NODES 

REL 

p-value'^ 

p-value'^ 

0.99997 

0.99773 

0.99752 

0.0835 

0.0059 

* 

0.49999 

0.49887 

0.99752 

0.0059 

* 

0.0577 

0.00000 

0.00000 

0.99752 

0.2403 

0.6762 

-0.49999 

-0.49887 

0.99752 

0.0013 

* 

0.4073 

-0.99997 

-0.99773 

0.99752 

0.0577 

0.1250 

0.49999 

0.99773 

0.49876 

0.1796 

0.0013 

♦ 

0.99997 

0.49887 

0.49876 

<  0.0001 

* 

<  0.0001 

* 

0.49999 

0.49887 

0.49876 

0.0011 

* 

0.2148 

0.00000 

0.49887 

0.49876 

0.0207 

0.1316 

0.49999 

0.00000 

0.49876 

0.0059 

* 

0.0013 

* 

0.00000 

0.00000 

0.49876 

0.0011 

♦ 

0.9459 

-0.49999 

0.00000 

0.49876 

0.4119 

0.4119 

0.00000 

-0.49887 

0.49876 

0.0002 

♦ 

0.5881 

-0.49999 

-0.49887 

0.49876 

0.6821 

0.0541 

-0.99997 

-0.49887 

0.49876 

0.4119 

0.0245 

* 

-0.49999 

-0.99773 

0.49876 

0.1316 

0.0577 

0.00000 

0.99773 

0.00000 

0.0835 

0.0002 

* 

0.49999 

0.49887 

0.00000 

0.0013 

* 

0.1316 

0.00000 

0.49887 

0.00000 

<  0.0001 

* 

0.0003 

* 

-0.49999 

0.49887 

0.00000 

0.0207 

* 

0.0013 

* 

0.99997 

0.00000 

0.00000 

0.1316 

0.0002 

* 

0.49999 

0.00000 

0.00000 

<  0.0001 

* 

0.0083 

* 

0.00000 

0.00000 

0.00000 

0.0192 

* 

0.4373 

-0.49999 

0.00000 

0.00000 

0.3746 

0.0769 

-0.99997 

0.00000 

0.00000 

0.0022 

♦ 

0.1316 

0.49999 

-0.49887 

0.00000 

0.1316 

0.0059 

* 

0.00000 

-0.49887 

0.00000 

0.0192 

* 

0.5627 

-0.49999 

-0.49887 

0.00000 

0.9423 

0.2517 

0.00000 

-0.99773 

0.00000 

<  0.0001 

* 

0.2517 

-0.49999 

0.99773 

-0.49876 

0.0096 

* 

0.0577 

0.00000 

0.49887 

-0.49876 

0.0013 

* 

0.0002 

* 

-0.49999 

0.49887 

-0.49876 

0.0403 

* 

0.0001 

* 

-0.99997 

0.49887 

-0.49876 

0.0577 

0.0059 

♦ 

0.49999 

0.00000 

-0.49876 

0.0002 

* 

0.0002 

* 

0.00000 

0.00000 

-0.49876 

0.0192 

♦ 

0.5627 

-0.49999 

0.00000 

-0.49876 

0.0577 

* 

0.1316 

0.99997 

-0.49887 

-0.49876 

0.0013 

* 

0.0013 

* 

0.49999 

-0.49887 

-0.49876 

0.0032 

* 

<  0.0001 

* 

0.00000 

-0.49887 

-0.49876 

0.8684 

0.5881 

0.49999 

-0.99773 

-0.49876 

0.0002 

* 

0.0002 

-0.99997 

0.99773 

-0.99752 

0.1316 

0.0059 

* 

-0.49999 

0.49887 

-0.99752 

0.0002 

* 

<  0.0001 

* 

0.00000 

0.00000 

-0.99752 

0.1316 

0.4119 

0.49999 

-0.49887 

-0.99752 

0.0002 

* 

0.0013 

* 

0.99997 

-0.99773 

-0.99752 

0.0004 

* 

0.1316 

t  Null  hypothesis 

on  no  difference 
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Table  3.9:  Sign  Test  Results  For  Type  L  Versus  Type  U  Distributions 


Correlation  Values 

PCA^  PCA'^  PA^A^ 

p- values 

NODES 

REL 

TypeL 

Type  U 

Type  L 

Type  U 

0.49999 

0.49887 

0.49876 

0.9940 

♦ 

0.9793 

* 

0.0577 

0.7483 

0.49999 

0 

0 

0.9987 

♦ 

0.9998 

* 

0.0002 

* 

0.0588 

0.49999 

-0.49887 

-0.49876 

0.9940 

* 

0.9423 

0.0000 

* 

0.0577 

0 

0.49887 

0 

0.9987 

* 

0.9940 

* 

0.0013 

* 

0.0577 

0 

0 

0.49876 

0.9999 

* 

0.7483 

0.6762 

0.9793  * 

0 

0 

0 

0.9940 

♦ 

0.7483 

0.0577 

0.9423 

0 

0 

-0.49876 

0.9423 

0.9423 

0.4783 

0.4119 

0 

-0.49887 

0 

0.9940 

* 

0.7483 

0.4119 

0.7483 

-0.49999 

0.49887 

-0.49876 

0.9987 

* 

0.4119 

0.0000 

♦ 

0.1316 

-0.49999 

0 

0 

0.6762 

0.5881 

0.0207 

* 

0.5881 

-0.49999 

-0.49887 

0.49876 

0.8684 

0.0835 

0.5000 

0.0207  * 

Based  on  the  results  in  this  study,  it  appears  that  the  choice  of  correlation 
measiure  influences  the  performance  of  solution  procedures  on  synthetic  test  prob¬ 
lems.  The  next  issue  is  whether  the  population  correlation  structure  and  constraint 
slackness  settings,  for  test  problems  generated  based  on  each  type  of  correlation 
measure,  influence  solution  procedure  performance.  CPLEX  performance  is  exam¬ 
ined  first,  followed  by  similar  analyses  of  TOYODA  performance. 


3.7  Analysis  of  CPLEX  performance 

In  this  section,  the  influence  of  the  poprdation  correlation  structure  and  constraint 
slackness  settings  on  CPLEX  performance  is  examined.  Two  regression  models 
are  constructed  to  summarize  the  effects  of  correlation  structure  and  constraint 
slackness  on  CPLEX  performance. 
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3.7.1  Correlation  structure  influence 

Tables  3.10  and  3.11  summarize  CPLEX  results  for  Pearson  and  Spearman  prob¬ 
lems,  respectively.  An  artificial  limit  of  250,000  NODES  was  imposed  on  CPLEX 
processing.  The  rightmost  columns  in  Tables  3.10  and  3.11  indicate  how  many, 
if  any,  problems  were  not  solved  to  optimality  in  this  study  due  to  this  limit. 
The  number  of  CPLEX  NODES  varies  greatly  as  the  population  correlation  struc¬ 
ture  changes. 

A  KW  test  was  conducted  on  the  data  grouped  by  population  correlation  struc¬ 
ture  to  test  for  a  difference  in  average  NODES  due  to  correlation  structure.  The 
KW  test  statistics  of  180.95  for  Pearson  correlation  problems  and  135.59  for  Spear¬ 
man  correlation  problems  equate  top- values  of  less  than  1.0  x  10“^°  for  each  test. 
Clearly,  there  is  a  CPLEX  performance  difference  due  to  correlation  structure. 

Independent  sampling  is  represented  hy  p  =  (0,0,0),  Type  U  in  Table  3.10. 
Independent  sampling  is  a  generally  accepted  method  of  generating  test  problems, 
however,  these  results  suggest  that  generating  test  problems  with  independent 
sampling  only  provides  little  information  about  the  full  range  of  test  problem  diffi¬ 
culty  that  is  observed  with  a  more  systematic  problem  generation  scheme  involving 
correlation  induction.  In  fact,  independent  sampling  seems  to  provide  information 
only  about  median  performance.  To  appreciate  the  CPLEX  performance  variation 
possible  with  different  correlation  structrues,  one  need  only  scan  down  the  columns 
of  Tables  3.10  and  3.11  and  notice  the  range  in  average  NODES,  the  corresponding 
standard  errors,  and  the  drastic  difference  in  average  NODES  between  the  first 


Table  3.10:  CPLEX  Results  using  Pearson  Correlation  Induction 


Type 

Correlation  Values 

p  p  p 

_  PcA^  _ Pa'^a'^ _ 

Mean 

NODES 

Standard 

Error 

Not 

Solved 

L 

0 

-0.49887 

-0.49876 

30284.1 

12347.01 

2 

L 

0.99997 

-0.99773 

-0.99752 

27947.5 

11937.36 

1 

L 

-0.99997 

0.99773 

-0.99752 

25250.4 

12153.85 

1 

L 

-0.99997 

-0.99773 

0.99752 

21507.3 

8969.49 

1 

U 

0.49999 

-0.49887 

-0.49876 

3749.9 

2822.78 

L 

0 

0 

-0.99752 

2206.5 

954.87 

L 

-0.99997 

-0.49887 

0.49876 

2051.4 

1166.00 

L 

-0.99997 

0.49887 

-0.49876 

1849.1 

703.87 

L 

-0.49999 

0 

-0.49876 

1752.7 

770.87 

L 

0 

0 

-0.49876 

1534.2 

923.33 

U 

0.49999 

0.49887 

0.49876 

1204.2 

629.53 

L 

0 

-0.99773 

0 

1105.6 

622.93 

U 

0 

-0.49887 

0 

659.7 

210.35 

U 

-0.49999 

0.49887 

-0.49876 

641.2 

197.34 

L 

-0.49999 

-0.99773 

0.49876 

624.8 

165.17 

U 

0 

0 

-0.49876 

622.3 

161.56 

L 

-0.49999 

-0.49887 

0 

589.1 

127.79 

U 

0 

0 

0 

551.9 

128.34 

L 

-0.49999 

-0.49887 

0.49876 

541.1 

210.68 

L 

0.49999 

-0.99773 

-0.49876 

535.7 

168.92 

U 

-0.49999 

-0.49887 

0.49876 

522.9 

142.40 

L 

0 

0.49887 

-0.49876 

365.6 

115.00 

L 

-0.99997 

0 

0 

354.8 

111.54 

L 

0.49999 

0 

-0.49876 

325.9 

69.87 

U 

-0.49999 

0 

0 

292.2 

85.33 

U 

0.49999 

0 

0 

275.1 

85.93 

L 

0.49999 

-0.49887 

-0.99752 

257.9 

94.70 

L 

0.49999 

-0.49887 

-0.49876 

222.8 

146.88 

L 

-0.49999 

0 

0 

220.9 

83.20 

L 

0 

-0.49887 

0 

210.3 

60.69 

U 

0 

0 

0.49876 

206.6 

51.84 

L 

0.49999 

0.49887 

0.49876 

158.3 

43.76 

L 

0 

0 

0.49876 

157.7 

36.82 

L 

0.49999 

0.49887 

0 

137.3 

26.89 

L 

0 

0 

0 

131.5 

36.91 

U 

0 

0.49887 

0 

129.8 

36.86 

L 

0.49999 

-0.49887 

0 

129.1 

23.82 

L 

-0.49999 

0.99773 

-0.49876 

125.7 

28.57 

L 

0.49999 

0 

0 

122.4 

44.82 

L 

0.99997 

0 

0 

114.2 

26.20 

L 

-0.49999 

0.49887 

-0.99752 

113.2 

24.71 

L 

-0.49999 

0 

0.49876 

110.6 

15.08 

L 

0 

0 

0.99752 

109.1 

19.74 

L 

0 

0.49887 

0 

94.2 

27.90 

L 

-0.49999 

0.49887 

-0.49876 

83.5 

36.19 

L 

-0.49999 

-0.49887 

0.99752 

82.6 

15.22 

L 

0.99997 

0.99773 

0.99752 

75.9 

18.20 

L 

-0.49999 

0.49887 

0 

72.9 

13.58 

L 

0.49999 

0.99773 

0.49876 

72.8 

17.72 

L 

0 

0.99773 

0 

66.8 

17.62 

L 

0.99997 

0.49887 

0.49876 

65.3 

18.33 

L 

0 

-0.49887 

0.49876 

65.3 

13.57 

L 

0.49999 

0.49887 

0.99752 

62.3 

14.30 

L 

0 

0.49887 

0.49876 

58.9 

14.36 

L 

0.49999 

0 

0.49876 

56.9 

12.69 

L 

0.99997 

-0.49887 

-0.49876 

40.7 

15.65 

Table  3.11:  CPLEX  Results  using  Spearman  Correlation  Induction 


Correlation  Values 

_ _ 

Mean 

NODES 

Standard 

Error 

Not 

Solved 

0.99997 

-0.99773 

-0.99752 

42945.3 

14333.93 

3 

-0.99997 

0.99773 

-0.99752 

38482.7 

14423.24 

3 

-0.99997 

0 

0 

33519.9 

12393.77 

2 

0 

0 

-0.99752 

25838.9 

10735.01 

1 

-0.99997 

-0.99773 

0.99752 

17667.7 

3934.72 

-0.99997 

-0.49887 

0.49876 

17310.8 

8920.90 

1 

-0.49999 

-0.99773 

0.49876 

4032.9 

2320.58 

0.49999 

-0.49887 

-0.99752 

2754.1 

1501.03 

-0.49999 

0.49887 

-0.99752 

2368.1 

781.46 

0.49999 

-0.99773 

-0.49876 

1977.2 

648.52 

0 

-0.99773 

0 

1755.6 

728.72 

-0.99997 

0.49887 

-0.49876 

1441.0 

676.44 

0.49999 

0 

-0.49876 

1203.3 

296.88 

0.99997 

-0.49887 

-0.49876 

989.5 

214.62 

0 

0.49887 

-0.49876 

871.2 

134.87 

0 

0 

-0.49876 

797.0 

110.24 

0.49999 

-0.49887 

-0.49876 

660.7 

117.69 

-0.49999 

0 

-0.49876 

591.6 

123.42 

0.49999 

0.49887 

0 

545.3 

88.25 

0.49999 

0 

0 

544.9 

59.64 

0 

0 

0 

510.2 

58.84 

0.49999 

-0.49887 

0 

481.8 

83.92 

0.99997 

0 

0 

477.1 

95.03 

-0.49999 

0.99773 

-0.49876 

474.9 

87.02 

0 

-0.49887 

0 

473.3 

48.62 

0 

0.49887 

0 

446.8 

53.61 

-0.49999 

0.49887 

-0.49876 

428.0 

56.26 

0 

-0.49887 

-0.49876 

416.0 

69.71 

0.49999 

0.49887 

0.49876 

407.7 

64.65 

-0.49999 

0 

0 

370.9 

52.23 

0 

0 

0.49876 

367.7 

47.56 

0.99997 

0.49887 

0.49876 

355.9 

57.60 

0 

-0.49887 

0.49876 

352.3 

49.64 

-0.49999 

-0.49887 

0.49876 

345.9 

34.33 

-0.49999 

-0.49887 

0.99752 

342.6 

40.79 

0 

0.99773 

0 

297.5 

63.24 

-0.49999 

-0.49887 

0 

262.3 

41.00 

0 

0 

0.99752 

237.2 

57.48 

0.49999 

0.49887 

0.99752 

213.9 

68.99 

0 

0.49887 

0.49876 

212.2 

43.64 

-0.49999 

0.49887 

0 

204.5 

37.19 

0.49999 

0 

0.49876 

189.8 

33.10 

0.99997 

0.99773 

0.99752 

182.8 

31.32 

-0.49999 

0 

0.49876 

145.4 

29.24 

0.49999 

0.99773 

0.49876 

87.9 

18.58 
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several  correlation  structures  listed  in  each  table  and  those  of  the  remaining  cor¬ 
relation  structures.  Not  surprisingly,  the  correlation  structures  with  the  largest 
average  NODES  include  the  unsolved  problem  instances  (250,000  NODES  for  each 
problem  included  in  the  average). 

The  ranked  results  in  Tables  3.10  and  3.11  highlight  patterns  among  the  corre¬ 
lation  structures.  Though  the  correlation  structures  hsted  at  the  top  of  Table  3.10 
are  Type  L  distributions,  when  both  Type  L  and  Type  U  distributions  were  avail¬ 
able  for  a  particular  correlation  structure,  the  Type  U  distribution  yielded  the 
more  difficult  problems.  There  are  more  negative  values  of  in  the  correlation 
structures  listed  in  the  top  portion  of  Tables  3.10  and  3.11  than  in  the  bottom 
portion.  The  potential  influence  of  the  individual  correlation  terms  is  examined  in 
the  next  subsection.  Finally,  challenging  problems  seem  to  have  larger  differences 
between  the  values  of  each  correlation  terms  within  a  correlation  structure. 


3.7.2  Individual  correlation  term  influence 

This  experiment  contains  three  subsets  of  design  points  in  which  each  correlation 
term  is  specified  at  all  five  design  settings  while  the  two  remaining  correlation  terms 
are  zero.  These  subsets  allow  one  to  determine  whether  changes  in  that  particular 
correlation  term  influence  CPLEX  performance  and  which  correlation  terms  have 
the  greatest  relative  influence.  For  each  correlation  term,  a  null  hypothesis  of  no 
influence  was  tested  using  the  KW  test.  The  test  statistics  and  corresponding 
p-values  are  provided  in  Table  3.12.  For  both  correlation  measures,  the  pcA^ 

terms  have  a  significant  influence  on  CPLEX  performance,  while  the  pcA^ 
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Table  3.12:  Results  of  Kruskal- Wallis  Tests  on  Each  Correlation  Term 


Correlation 

Test 

Term 

Statistic 

p- value 

Pc,A^ 

1.2304 

0.873 

PcA^ 

18.2345 

0.001 

CN 

14.1310 

0.007 

(a)  Pearson  Measure 


Correlation 

Test 

Term 

Statistic 

p- value 

Pc,A^ 

4.5899 

0.332 

Pc,A^ 

11.6578 

0.020 

Paka^ 

13.6610 

0.008 

(b)  Spearman  Measure 


term  does  not.  A  possible  explanation  is  that  the  distribntion  for  has  a  larger 
mean  and  variance  than  the  distribution  for  The  larger  variance  of  then 
produces  a  larger  variance  in  the  right-hand  side  coefficient  for  the  first  constraint, 
bi,  which  in  turn  produces  more  problems  with  loose  constraints.  As  will  be  shown 
in  the  next  subsection,  loose  constraints  make  for  easier  problems  thereby  reducing 
the  influence  of  the  pcA^  term  on  CPLEX  performance. 

The  lack  of  signiflcance  of  seems  to  conflict  with  Table  3.11  which  lists  S  = 
(—0.99997,0,0)  as  a  challenging  correlation  structure.  The  large  average  NODES 
associated  with  this  structure  is  due  to  the  extremely  difficult  problems  that  result 
when  this  structure  is  coupled  with  slackness  setting  (S'!,  S2)  —  (0.30, 0.70);  average 
NODES  is  133,206  for  these  5  problems.  Interestingly,  for  the  same  correlation 
structure  when  {81,82)  =  (0.70,0.30)  average  NODES  is  just  45.  A  KW  test  is  a 
ranks  test  that  is  not  necessarily  affected  by  such  extreme  data  points.  However, 
this  example  demonstrates  that  constraint  slackness  settings  Hkely  represent  a 
significant  influence  on  CPLEX. 
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Table  3.13:  Mean  NODES  by  Constraint  Slackness  Setting 


3.7.3  Constraint  slackness  influences 

Table  3.13  summarizes  CPLEX  performance  by  constraint  slackness  settings.  Tight 
constraints,  and  in  particular  a  tight  first  constraint  (i.e.,  =  0.3),  seem  to 

produce  the  more  challenging  test  problems.  The  data  are  grouped  by  slackness 
settings  for  a  KW  test  for  a  difference  in  CPLEX  performance  due  to  constraint 
slackness  setting.  The  KW  test  statistics  of  76.76  for  Pearson  correlation  test 
problems  and  235.07  for  Spearman  correlation  test  problems  equate  to  p- values 
of  less  than  1.0  x  10“^®  in  each  test.  So,  constraint  slackness  settings  represent  a 
significant  influence  on  CPLEX  performance. 

Notice  the  mean  NODES  in  Table  3.13(a)  and  3.13(b)  for  (5'i,  S2)  =  (0.30, 0.70) 
and  (*S'i,5'2)  =  (0,70,0,30)  are  quite  different.  The  standard  errors  hsted  do  not 
suggest  the  means  are  different.  However,  by  pairing  observations  with  {81,82)  = 
(0.30,0.70)  with  the  corresponding  observations  having  {81,82)  =  (0.70,0.30), 
a  sign  test  may  be  conducted  for  a  null  hypothesis  of  no  difference  in  CPLEX 
performance  for  mixed  slackness  settings.  The  sign  test  results  are  provided  in 
Table  3.14  for  each  correlation  measure  along  with  the  a.  =  0.05  acceptance  regions. 


5-2 

Mean  NODES 

0.30 

0.30 

4290.87 

0.30 

0.70 

6307.78 

0.70 

0.30 

3110.35 

0.70 

0.70 

1286.33 

(b)  Spearman  Measrue 


-S'! 

^2 

Mean  NODES 

0.30 

0.30 

3524.17 

0.30 

0.70 

3191.36 

0.70 

0.30 

2299.12 

0.70 

0.70 

335.34 

(a)  Pearson  Measure 
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Table  3.14:  Sign  Test  Results  for  Performance  Differences  Between  Mixed  Con¬ 
straint  Slackness  Levels 


Correlation 

Measure 

Total 

Total 
dj  >  0 

Acceptance 

Region 

p- value 

Pearson  Measure 

273 

177 

(120,153) 

<0.0001 

Spearman  Measime 

278 

201 

(123,155) 

<0.0001 

The  p- values  indicate  that,  with  a  mixed  constraint  slackness  setting,  a  tight  first 
constraint  produces  a  more  challenging  problem.  CPLEX  performance  appears 
more  sensitive  to  low  Si  values  than  to  low  S2  values  given  mixed  slackness  settings 
and  the  distributions  specified  for  and  A^. 


3.7.4  The  interaction  between  correlation  structure  and 
constraint  slackness 

The  information  contained  in  Tables  3.15  through  3.18  provide  some  insight  into 
the  interaction  between  correlation  and  constraint  slackness  for  each  correlation 
measure.  Table  3.15  lists  average  NODES  for  each  setting  of  PcA^  Si  and  for 
each  setting  of  pc'^2  and  S2.  Regardless  of  slackness  value,  extreme  negative  corre¬ 
lation  between  objective  function  and  constraint  coefficients  yields  problems  that 
challenge  CPLEX.  For  the  experiment  design  points  selected,  Pa^a^  <  0  whenever 
PcA^  <  0  or  pcA^  <  0.  So  the  effect  of  negative  pcA^  or  which  conflicts  with 
other  studies  on  similar  types  of  problems,  may  be  better  attributed  to  Pa^a^  <  0. 
There  is  also  a  tendency  for  the  combination  of  extreme  positive  correlation  and  a 
loose  constraint  to  yield  problems  that  challenge  CPLEX. 
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Table  3.15:  Interaction  of  Constraint  Slackness  and  Correlation  Type  on  Average 
NODES 


First  Constraint  Effects 

Si  =  0.30  Si  =  0.70 

-0.99997 

-0.49999 

0.0 

0.49999 

0.99997 

19668.6  17803.9 

575.2  208.0 

3833.8  208.9 

872.7  141.6 

82.4  11211.6 

Seconc 

.  Constraint  Effects 

WKmm 

^2  =  0.30  S2  =  0.70 

-0.99773 

-0.49887 

0.0 

0.49887 

0.99773 

14994.5  5693.8 

5115.6  513.9 

625.6  390.5 

352.1  367.2 

54.1  10182.5 

(a) 

Pearson  Measure 

First  Constraint  Effects 

— 

Si  =  0.30  Si  =  0.70 

■ 

40558.7  35594.5 

815.0  804.3 

3427.7  377.9 

1057.8  451.1 

1210.0  12271.4 

Seconc 

Constraint  Effects 

mmm 

S2  =  0.30  S2  =  0.70 

-0.99773 

-0.49887 

0.0 

0.49887 

0.99773 

19611.6  7739.9 

875.8  2819.8 

3375.9  4111.3 

694.0  559.8 

5286.8  10523.5 

(b)  Spearman  Measure 


Table  3.16  lists  average  NODES  by  slackness  setting  for  each  level  of  intercon¬ 
straint  correlation.  Extreme  negative  correlation  between  constraint  coefficients 
yields  challenging  problems.  The  average  NODES  values  are  relatively  high  for 
extreme  positive  values  of  when  =  0.30.  With  a  mixed  slackness  setting 
and  extreme  negative  interconstraint  correlation,  the  average  NODES  is  also  very 
high.  The  “bump”  in  Table  3.16(a)  for  ~  —0.49876  and  tight  constraints 

is  caused  by  a  particularly  challenging  design  point,  P  =  (0,  —0.49887,  —0.49876) 
and  {Si,  S2)  =  (0.30, 0.30),  for  which  two  test  problems  were  not  solved  in  the  full 
250,000  NODES  limit. 

Space  prohibits  hsting  the  performance  averages  for  all  224  design  points,  so 
Tables  3.17  and  3.18  list  just  the  three  design  points  at  each  extreme  level  of  per¬ 
formance.  Notice  the  disparity  of  CPLEX  performance  between  the  design  points 
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Table  3.16:  Average  NODES  by  Inter- Constraint  Slackness  and  Correlation 


P4142 

.S'!  =  0.30 
S2  =  0.30 

51  =  0.70 

52  =  0.70 

-0.997752 

33393.4 

11110.9 

-0.49876 

1564.7 

483.4 

0.0 

975.2 

307.7 

0.49876 

520.4 

266.7 

0.99752 

5315.4 

200.0 

Si  =  0.30 

Si  =  0.70 

S'2  =  0.70 

S2  =  0.30 

-0.99752 

22411.3 

22995.7 

-0.49876 

690.3 

614.7 

0.0 

7699.1 

403.8 

0.49876 

5225.6 

1149.9 

0.99752 

3954.4 

5445.6 

^1-0.30  5i=0.70 
S'2  =  0.30  5*2  =  0.70 

-0.99752 

-0.49876 

0.0 

0.49876 

0.99752 

655.9  1337.1 

10888.3  289.9 

497.1  177.4 

565.4  282.3 

4954.9  177.7 

Pa^a^ 

Si  =  0.30  Si  =  0.70 
5'2=0.70  ^2=0.30 

-0.99752 

-0.49876 

0.0 

0.49876 

0.99752 

20260.3  22367.2 

655.5  204.4 

191.2  302.6 

648.3  188.67 

11144.0  1193.1 

(a)  Pearson  Measure  (b)  Spearman  Measme 


listed  in  Tables  3.17(a)  and  3.18(a)  versus  those  in  Tables  3.17(b)  and  3.18(b). 
Problems  requiring  more  NODES  involve  negative  correlation  and  tight  constraints 
while  problems  requiring  less  NODES  involve  primarily  positive  correlation  values 
and  loose  constraints.  Further,  it  appears  that  problems  with  large  differences 
between  the  values  of  correlation  terms  require  more  NODES. 

One  way  to  generate  a  difficult  2KP  instance  is  to  induce  extreme  negative 
correlation  between  the  objective  function  and  a  tight  constraint.  Another  ap¬ 
proach  to  creating  difficult  problems  is  to  induce  negative  correlation  between 
constraint  coefficients,  such  as  specifying  either  of  the  extreme  correlation  struc¬ 
tures  p  =  (0.99997,-0.99773,-0.99752)  or  p  =  (-0.99997,0.99773,-0.99752). 
Conversely,  one  can  create  easier  problems  by  avoiding  any  negative  correlation  in 
the  population  correlation  structure  and  making  the  constraints  loose. 
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Table  3.17:  Design  Points  Requiring  Most  and  Least  Average  NODES  for  Pearson 
Correlation  Problems 


(a)  Design  Points  Averaging  Most  NODES 


PCA^ 

PcA^ 

^2 

Mean 

Std  Error 

0.0 

-0.49887 

-0.49876 

0.30 

0.30 

119724.4 

55912.53 

0.99997 

-0.99773 

-0.99752 

0.70 

0.30 

111466.4 

55586.62 

-0.99997 

0.99773 

-0.99752 

0.30 

0.70 

100801.2 

60914.42 

(b)  Design  Points  Averaging  Least  NODES 


PcA^ 

PcA^ 

Pa^a'^ 

*92 

Mean 

Std  Error 

0.99997 

0.49987 

0.49876 

0.70 

0.30 

3.8 

1.96 

0.49999 

0.0 

0.0 

0.70 

0.30 

3.8 

2.35 

0.99997 

-0.49987 

-0.49876 

0.70 

0.30 

4.4 

2.16 

Table  3.18:  Design  Points  Requiring  Most  and  Least  Average  NODES  for  Spear¬ 
man  Correlation  Problems 


(a)  Design  Points  Averaging  Most  NODES 


PcA^ 

PcA^ 

Pa^a-^ 

*^1 

^2 

Mean 

Std  Error 

0.99997 

-0.99773 

-0.99752 

0.70 

0.30 

113852.0 

56803.73 

-0.99997 

0.99773 

-0.99752 

0.30 

0.70 

103109.6 

60021.89 

0.0 

0.0 

-0.99752 

0.30 

0.30 

99214.6 

50753.77 

(b)  Design  Points  Averaging  Least  NODES 

PcA^ 

PcA^ 

P%A-i 

-^1 

.S'2 

Mean 

Std  Error 

0.0 

0.49987 

0.49876 

0.70 

0.70 

22.8 

9.19 

0.49999 

0.99773 

0.49876 

0.70 

0.70 

23.0 

18.03 

-0.99997 

0.0 

0.0 

0.70 

0.70 

27.6 

20.08 
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3.7.5  Regression  models  for  NODES 

This  section  describes  the  regression  models  fit  to  the  experiment  data  to  describe 
the  relationship  between  the  experiment  design  factors  and  the  performance  mea¬ 
sure  NODES.  These  models  were  developed  using  a  stepwise  regression  procedure 
that  maximizes  the  coefficient  of  determination,  R^.  The  transformed  response  in 
each  model  is  the  natmal  logarithm  of  NODES. 

Define  disparity,  D,  as  the  largest  absolute  deviation  between  any  two  of  the 
correlation  terms.  Table  3.19  lists  the  best  regression  model  for  each  correlation 
measm-e.  For  each  term  significant  at  the  a  =  0.05  level,  the  regression  model 
coefficient  is  provided,  while  p- values  are  provided  for  the  remaining  terms  in  the 
model.  Despite  the  transformation  on  NODES,  both  regression  models  have  low 
values  for  R^. 

The  model  for  the  Pearson  measure  is  similar  to  that  for  the  Spearman  mea¬ 
sure.  There  are  ten  significant  factors  in  common,  each  significant  term  having  the 
same  sign.  The  factor  D  is  significant,  supporting  earlier  observations  regarding 
its  influence.  The  coefficient  for  the  constraint  slackness  factor  indicates  that  loose 
constraints  tend  to  reduce  NODES.  Also  supporting  earher  findings  are  the  signif¬ 
icant  interaction  terms  on  constraint  slackness  and  correlation  setting.  Neither  of 
the  pcA^  pcA'^  factors  are  significant,  although  the  sign  test  previously  found 
PcA'^  significant.  This  may  be  due  to  these  terms  being  correlated  with  D,  and  the 
interaction  terms  involving  pcA^  and  being  significant  factors  in  the  model. 


Table  3.19:  Regression  Model  o: 

‘  CPLEX  Results 

Pearson  Measure 

Spearman  Measure 

LN(NODES) 

LN(NODES) 

Source 

Coefficient  p-value 

Coefficient  p-value 

Intercept 

9.20 

10.09 

-9.02 

-8.36 

^2 

-5.25 

-5.67 

PCA^ 

0.725 

0.465 

PCA^ 

0.637 

0.323 

-1.56 

-1.25 

D 

-1.89 

-0.99 

Si  X  ^2 

10.61 

10.24 

Si  X  PcA^ 

2.49 

2.51 

Si  X  PCA^ 

-2.19 

-1.15 

Si  X 

0.583 

0.149 

S2  X  PcA^ 

-1.54 

0.243 

S2  X  PcA'^ 

1.83 

1.63 

S2  X  pa^a'^ 

1.48 

1.25 

PCA^  X  pcA'^ 

0.152 

PCA^  X 

-0.82 

0.181 

PCA'2  X  p^i^2 

0.179 

-1.95 

5iX  D 

1.98 

5'2X  D 

0.394 

PCA^X  D 

-0.51 

0.105 

PcA'^X  D 

0.117 

-0.77 

Pa^a'^x  D 

0.323 

-0.41 

=  0.216 

R2  =  0.255 
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3.8  Analysis  of  heuristic  performance 

In  this  section,  the  influence  of  population  correlation  structure  and  constraint 
slackness  settings  on  TOYODA  performance  is  examined.  The  measure  of  perfor¬ 
mance  for  the  analysis  is  REL.  Two  regression  models  are  constructed  to  summarize 
the  influence  of  constraint  slackness  and  correlation  on  TOYODA  performance. 


3.8.1  Correlation  structure  influence 

Tables  3.20  and  3.21  summarize  TOYODA  performance  by  population  correlation 
structure  for  Pearson  and  Spearman  problems,  respectively.  Past  research  has 
shown  that  TOYODA  generally  provides  very  good  solutions,  and  the  results  in 
this  study  support  this  point.  Although  there  is  no  drastic  change  between  any  two 
consecutive  correlation  structiures  listed  in  the  tables,  there  is  a  noticeable  differ¬ 
ence  in  REL  averages  and  the  standard  errors  between  those  correlation  structures 
at  the  top  and  those  at  the  bottom  of  Tables  3.20  and  3.21.  The  Pa^A^  term  ap¬ 
pears  to  be  a  particularly  important  factor  since  the  correlation  structures  with 
the  larger  average  REL  values  generally  include  negative  values  of  P4142  while  the 
opposite  holds  for  those  correlation  structures  with  smaller  average  REL  values. 
Independent  sampling  (p  =  (0,0,0),  Type  U  in  Table  3.20),  does  not  appear  to 
provide  particularly  difficult  problems.  In  fact  the  average  REL  with  independent 
sampling  is  at  the  mean  and  just  above  the  median  value  of  average  REL  over  all 
design  points.  Table  3.20  specifies  REL  by  distribution  type.  Type  L  or  Type  U. 
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When  both  distributions  were  available  for  a  particular  correlation  structure,  the 
Type  U  distribution  yielded  the  more  difficult  problems.  This  means  Type  U 
distributions  might  be  preferable  for  generating  difficult  2KP  problem  instances. 

Of  the  2240  test  problems  generated,  TOYODA  found  an  optimal  solution  in 
156.  Tables  3.20  and  3.21  list  the  number  of  optimal  solutions  found  for  each 
population  correlation  structme.  A  common  feature  among  most  of  these  corre¬ 
lation  structures  is  non-positive  values  for  pcA^  and  pcA^  and  non-negative  values 
for  Pa^a^-  Consider  the  population  correlation  structure  p  =  (—0.99997,-0.99773, 
0.99752)  listed  at  the  bottom  of  Table  3.20  and  near  the  top  of  Tables  3.10  and 
3.11.  This  structure  produces  challenging  test  problems  for  the  CPLEX  procedure, 
but  for  37  of  the  40  test  problems  generated  with  this  correlation  structure,  TOY¬ 
ODA  foiuid  an  optimal  solution.  This  is  interesting  because  the  problems  that  are 
challenging  for  one  procedrue  may  not  be  challenging  for  the  other. 

A  formal  test  of  no  performance  differences  due  to  correlation  structrue  is  con¬ 
ducted  using  the  KW  test  on  the  data  grouped  by  correlation  structure.  The  KW 
test  statistics  of  328.32  for  Pearson  correlation  problems  and  402.46  for  Spearman 
correlation  problems  have  p-values  near  zero.  So,  the  correlation  structure  is  a 
significant  influence  on  TOYODA  performance. 


3.8.2  Individual  correlation  term  influence 

This  experiment  includes  three  subsets  of  design  points  in  which  each  correlation 
term  is  specified  at  all  design  settings  while  the  two  remaining  correlation  terms  are 
specified  as  zero.  These  subsets  may  be  used  to  determine  whether  each  particular 
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Table  3.20:  TOYODA  Results  using  Pearson  Correlation  Induction 


Table  3.21:  TOYODA  Results  using  Spearman  Correlation  Induction 


Correlation  Values 

PcA^  PcA^  P%A^^ 

Mean 

REL 

Standard 

Error 

Solved  to 
CPLEX  Value 

-0.49999 

0.49887 

-0.99752 

3.29 

0.325 

0.99997 

-0.49887 

-0.49876 

3.24 

0.471 

2 

0.49999 

-0.49887 

-0.99752 

3.15 

0.342 

-0.49999 

0.99773 

-0.49876 

2.99 

0.406 

1 

0 

0.49887 

-0.49876 

2.77 

0.359 

-0.49999 

0.49887 

-0.49876 

2.73 

0.233 

0.99997 

0 

0 

2.64 

0.451 

1 

0.49999 

0.99773 

0.49876 

2.53 

0.305 

-0.99997 

0.99773 

-0.99752 

2.48 

0.425 

0.49999 

-0.49887 

-0.49876 

2.37 

0.183 

0 

0.99773 

0 

2.35 

0.337 

0.49999 

0 

-0.49876 

2.28 

0.245 

0.99997 

-0.99773 

-0.99752 

2.24 

0.419 

0.99997 

0.49887 

0.49876 

2.24 

0.289 

0 

0.49887 

0 

2.14 

0.213 

-0.49999 

0.49887 

0 

1.86 

0.311 

0.49999 

-0.49887 

0 

1.75 

0.314 

1 

0.49999 

0 

0 

1.65 

0.153 

-0.99997 

0.49887 

-0.49876 

1.64 

0.281 

0 

0 

-0.99752 

1.63 

0.226 

1 

0 

0.49887 

0.49876 

1.60 

0.312 

2 

0 

-0.49887 

-0.49876 

1.53 

0.191 

0.49999 

-0.99773 

-0.49876 

1.53 

0.242 

0.49999 

0.49887 

0 

1.32 

0.140 

1 

-0.49999 

0 

0 

1.29 

0.140 

0.49999 

0 

0.49876 

1.18 

0.172 

0 

-0.99773 

0 

1.16 

0.221 

1 

0 

-0.49887 

0 

1.16 

0.135 

-0.49999 

0 

-0.49876 

1.15 

0.172 

0 

0 

-0.49876 

1.14 

0.105 

2 

-0.99997 

0 

0 

1.01 

0.217 

-0.49999 

0 

0.49876 

0.75 

0.129 

0 

-0.49887 

0.49876 

0.71 

0.152 

5 

0.49999 

0.49887 

0.49876 

0.69 

0.053 

1 

0 

0 

0 

0.69 

0.048 

1 

0.99997 

0.99773 

0.99752 

0.69 

0.101 

1 

0 

0 

0.49876 

0.61 

0.087 

2 

-0.49999 

-0.49887 

0.49876 

0.58 

0.087 

7 

-0.49999 

-0.49887 

0 

0.54 

0.069 

2 

-0.49999 

-0.99773 

0.49876 

0.46 

0.085 

1 

0.49999 

0.49887 

0.99752 

0.36 

0.037 

-0.99997 

-0.49887 

0.49876 

0.36 

0.074 

6 

0 

0 

0.99752 

0.18 

0.036 

8 

-0.49999 

-0.49887 

0.99752 

0.16 

0.023 

5 

-0.99997 

-0.99773 

0.99752 

0.07 

0.044 

17 
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Table  3.22:  Results  of  Kruskal- Wallis  Tests  on  Each  Type  of  Correlation  for  REL 


Type  of 

Test 

Correlation 

Statistic 

p- value 

PcA'^ 

26.787 

<  0.0001 

PcA^ 

30.930 

<  0.0001 

Paa^ 

36.058 

<  0.0001 

Type  of 

Test 

Correlation 

Statistic 

p- value 

P 

PcA^ 

17.205 

0.0018 

Pga^ 

17.193 

0.0018 

Paa'^ 

43.79 

<  0.0001 

(a)  Pearson  Measure  (b)  Spearman  Measure 


correlation  term  influences  TOYODA  performance  and  which  correlation  terms 
have  the  greatest  influence.  For  each  correlation  term,  a  null  hypothesis  of  no 
influence  was  tested  using  the  KW  test.  The  test  statistics  and  corresponding  p- 
values  are  provided  in  Table  3.22.  In  each  case  the  null  hypothesis  is  rejected,  and 
the  conclusion  is  that  each  correlation  term  influences  TOYODA  performance. 

The  test  statistics  for  p^i^2  and  p^i^2  imply  that  each  term  is  a  particularly 
significant  factor  influencing  TOYODA  procednre  performance.  As  noted  earlier, 
many  of  the  previous  studies  on  MKP  hemistics  include  TOYODA  as  a  benchmark 
procedme.  Past  research  with  induced  correlation  involves  high  positive  values  for 
each  of  pcA^  and  pcA^t  which  then  implies  a  high  positive  value  of  Pa^A^i  such  as 
was  the  case  in  Balas  and  Martin  (1980).  However,  this  study  finds  that  <  0 

actually  produces  the  more  challenging  test  problems  for  TOYODA. 

CPLEX  and  TOYODA  results  differ  regarding  the  influence  of  the  and 
PcA^  terms.  While  neither  term  significantly  influenced  CPLEX,  both  strongly 
influence  TOYODA  performance.  This  is  reconciled  by  noting  that  TOYODA 
transforms  each  2KP  so  that  6*  =  1,  *  =  1,  2.  Thus,  the  influence  of  the  mean  and 
variance  of  in  coimtering  the  influence  of  each  correlation  term  is  mitigated. 
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3.8.3  Constraint  slackness  influence 

Table  3.23  lists  the  average  REL  by  constraint  slackness  settings.  Clearly,  (^i,  S2)  = 
(0.30, 0.30)  represents  the  most  difficult  type  of  problem  for  TOYODA  in  terms 
of  overall  average  REL.  The  differences  among  the  other  average  REL  values  are 
quite  small,  as  are  their  standard  errors.  A  KW  test  is  used  to  test  for  a  difference 
in  REL  due  to  slackness  setting.  The  KW  test  statistics  of  192.09  for  Pearson 
correlation  problems  and  178.49  for  Spearman  correlation  problems  have  p- values 
near  zero.  So  constraint  slackness  settings  have  a  statistically  significant  influence 
on  TOYODA  performance,  which  agrees  with  past  research. 

Unlike  the  results  from  CPLEX  presented  in  Table  3.13  there  does  appear  to  be 
a  significant  difference  in  REL  values  when  {81,82)  =  (0.30,0.70)  as  compared  to 
REL  values  when  {81,82)  —  (0.70,0.30).  Table  3.24  presents  the  sign  test  results 
for  each  correlation  measure  when  slackness  settings  are  mixed,  with  the  a  =  0.05 
acceptance  regions  for  the  test  provided.  These  results  indicate  that  when  mixed 
slackness  settings  are  involved,  and  the  problems  are  based  on  the  Pearson  measure, 
a  tight  first  constraint  tends  to  be  associated  with  more  challenging  problems.  For 
the  problems  based  on  the  Spearman  measure  both  mixed  slackness  settings  yield 
problems  of  about  equal  REL.  However,  because  TOYODA  normalizes  each  con¬ 
straint,  the  interpretation  of  these  results  is  not  as  straightforward  as  for  CPLEX 
results. 

Past  hemistic  research  has  not  addressed  the  influence  of  mixed  slackness  lev¬ 
els.  Balas  and  Martin  (1980)  use  randomly  generated  slackness  levels  but  do  not 
examine  the  effects  of  mixed  levels  on  heuristic  procedure  performance. 
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Table  3.23:  Mean  REL  by  Constraint  Slackness  Setting 


-S'! 

^2 

Mean  REL 

0.30 

0.30 

1.53 

0.30 

0.70 

0.42 

0.70 

0.30 

0.58 

0.70 

0.70 

0.54 

(a)  Pearson  Measure 


S2 

Mean  REL 

0.30 

0.30 

2.95 

0.30 

0.70 

0.95 

0.70 

0.30 

1.00 

0.70 

0.70 

1.10 

(b)  Spearman  Measure 


Table  3.24:  Sign  Test  Results  for  Performance  Differences  Between  Mixed  Con¬ 
straint  Slackness  Levels 


Correlation 

Measure 

Total 

Total 

di>  0 

Acceptance 

Region 

p- value 

Pearson  Measure 

273 

162 

(120,153) 

0.0008 

Spearman  Measure 

276 

151 

(122,154) 

0.0520 

3,8.4  The  interaction  between  correlation  structure  and 
constraint  slackness 

While  previous  studies  have  examined  the  influence  of  constraint  slackness  set¬ 
tings,  none  have  been  able  to  examine  the  interaction  between  constraint  slackness 
settings  and  population  correlation  structure. 

Figures  3.4  through  Figure  3.7  plot  average  REL  values  for  various  slackness- 
correlation  combinations.  In  each  plot,  the  correlation  values  on  the  X-axis  are 
rormded  for  ease  of  presentation.  Figures  3.4  and  3.5  plot  results  for  Pearson  prob¬ 
lems;  Figures  3.6  and  3.7  plot  results  for  Spearman  problems.  In  the  two  plots 
shown  in  both  Figures  3.4  and  3.6,  average  REL  values  are  plotted  versus  the  cor¬ 
relation  value  between  coefficients  of  the  objective  function  and  each  constraint, 
for  both  tight  and  loose  constraints.  REL  averages  tend  to  vary  directly  with  the 
correlation  values  meaning  that  the  more  challenging  problems  have  higher  corre- 
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lation  values.  At  each  correlation  value,  REL  averages  were  lower  for  the  larger 
slackness  values,  which  means  that  better  solutions  are  found  for  problems  with 
looser  constraints.  Finally,  the  increasing  differences  between  the  REL  values  plot¬ 
ted  for  increasing  correlation  values  indicate  there  is  an  interaction  effect  between 
the  correlation  and  constraint  slackness  factors  on  TOYODA  performance. 

Figures  3.5  and  3.7  plot  REL  against  the  interconstraint  correlation  for  tight 
and  loose  slackness  settings  for  Pearson  and  Spearman  problems,  respectively.  The 
trend  is  for  decreasing  values  of  to  result  in  increasing  average  REL  values.  At 
negative  values  of  and  (*S'i,  ^'2)  =  (0.30, 0.30),  there  is  quite  a  large  difference 
in  average  REL  as  compared  to  when  {Si,S2)  —  (0.70,0.70).  Finally,  these  plots 
provide  strong  evidence  of  an  interaction  between  correlation  and  the  constraint 
slackness  setting. 

Space  prohibits  hsting  the  performance  averages  for  all  224  design  points,  so 
the  three  design  points  with  the  best  and  worst  levels  of  performance  are  hsted  in 
Tables  3.25  and  3.26.  These  results  are  what  would  be  expected  after  examining 
Figures  3.4  through  3.7.  Both  constraints  being  tight  yields  harder  problems,  par¬ 
ticularly  when  combined  with  negative  values  of  Pa^-A^-  Test  problems  with  the 
worst  REL  averages  have  pcA^  >  0,  pcA^  ^  0,  and  ^41^2  <  0.  Within  the  corre¬ 
lation  structures  for  the  easier  problems,  the  largest  absolute  difference  between 
any  correlation  terms  seems  larger  than  in  the  correlation  structure  of  the  harder 
problems.  The  influence  of  this  phenomenon  is  examined  later  using  a  regression 
model.  Tables  3.25  and  3.26,  though  purposely  not  all  inclusive,  illustrate  the 
range  of  TOYODA  performance,  which  is  not  so  apparent  in  the  data  presented 
in  Tables  3.20  and  3.21. 


Figure  3.4:  Pearson  Correlation  Measure  -  REL  x  Correlation  Setting 
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Figure  3.5:  Inter- Constraint  Correlation  -  REL  x  Correlation  Setting 
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Figure  3.6:  Spearman  Correlation  Measure  -  REL  x  Correlation  Setting 
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Figiire  3.7:  Inter- Constraint  Correlation  -  REL  x  Correlation  Setting 
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Table  3.25:  Design  Points  with  Extreme  REL  Averages  for  Pearson  Correlation 
Problems 


(a)  Design  Points  Averaging  Worst  REL 


PCA^ 

PcA-i 

Pa^a-^ 

Mean 

Std  Error 

-0.49999 

0.49887 

-0.49876 

0.30 

0.30 

4.6 

0.73 

0.0 

0.0 

-0.99752 

0.30 

0.30 

4.2 

1.06 

0.0 

-0.49887 

-0.49876 

0.30 

0.30 

4.1 

0.71 

(b)  Design  Points  Averaging  Best  REL 


PCA^ 

PcA^ 

Pma^ 

5-2 

Mean 

Std  Error 

-0.99997 

-0.99773 

0.99752 

Any 

Any 

0.0 

0.0 

-0.99997 

-0.49887 

0.49876 

0.30 

0.70 

0.0 

0.0 

-0.99997 

0.0 

0.0 

0.30 

0.30 

0.01 

0.005 

Table  3.26;  Design  Points  with  Extreme  REL  Averages  for  Spearman  Correlation 
Problems 


(a)  Design  Points  Averaging  Worst  REL 


PcA^ 

PcA^ 

P%A‘2 

-s-i 

-^2 

Mean 

Std  Error 

-0.49999 

0.49887 

-0.49876 

0.30 

0.30 

6.7 

0.66 

0.99997 

0.0 

0.0 

0.30 

0.30 

6.4 

1.32 

0.0 

0.49887 

-0.49876 

0.30 

0.30 

6.1 

0.94 

(b)  Design  Points  Averaging  Best  REL 


PcA^ 

PcA^ 

-S-a 

Mean 

Std  Error 

-0.99997 

-0.99773 

0.99752 

0.70 

0.70 

0.0 

0.0 

-0.99997 

-0.99773 

0.99752 

0.30 

0.70 

0.0 

0.0 

-0.99997 

-0.99773 

0.99752 

0.30 

0.30 

0.01 

0.006 
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3.8.5  Regression  models  for  REL 

Table  3.27  contains  the  regression  models  developed  to  describe  REL  in  terms  of 
the  experiment  design  factors.  For  these  models,  a  “best”  regression  is  defined 
as  the  model  maximizing  the  value  of  R^.  The  disparity  term,  D,  represents  the 
largest  absolute  deviation  between  any  two  correlation  terms  within  the  correlation 
structure.  The  coefficients  of  the  significant  terms  in  the  regression  model  are 
indicated;  p- values  are  provided  for  the  insignificant  terms  included  in  the  model. 

Each  model  contains  significant  terms  for  both  constraint  slackness  factors, 
which  agrees  with  the  KW  test  results  previously  presented.  Also  significant  are 
the  constraint  slackness  and  objective  function-constraint  correlation  interactions, 
which  is  the  trend  seen  in  Figures  3.4-3.7.  The  D  term  is  significant,  as  was  the  case 
with  the  regression  model  for  the  CPLEX  results.  Comparing  these  models  side- 
by-side  there  are  clear  differences  attributable  to  the  type  of  correlation  measure. 
Although  the  regression  models  have  nine  common  significant  terms,  and  most  of 
these  common  terms  agree  in  sign  (8  of  the  9),  the  magnitudes  of  most  of  these 
terms  differ. 

Although  the  TOYODA  hemistic  provides  generally  good  solutions  across  a 
wide  range  of  problems,  the  procedmre  is  sensitive  to  variations  in  correlation  struc¬ 
ture  and  constraint  slackness  settings.  With  certain  types  of  correlation  structures 
TOYODA  provides  excellent  solutions.  Of  particular  interest  is  the  influence  of 
negative  values  of  and  the  interaction  between  constraint  slackness  settings 
and  values  specified  for  the  correlation  terms  in  the  correlation  structure. 


Table  3.27;  F 

degression  Model  of  TOY OD A  Results 

Pearson  Measure 

Spearman  Measure 

LN(REL) 

REL 

Source 

Coefficient  p-value 

Coefficient  p- value 

Intercept 

3.46 

6.61 

-5.23 

-8.81 

-6.21 

-9.27 

PCA^ 

0.124 

0.520 

PCA^ 

0.608 

1.05 

PA^A^ 

0.165 

-2.00 

Disparity 

-0.90 

0.69 

Si  X  S2 

8.83 

13.11 

Si  X  PcA^ 

-2.49 

-1.13 

Si  X  PCA'^ 

1.83 

0.113 

Si  X  pa^a^ 

0.180 

0.77 

S2  X  PcA^ 

3.26 

1.12 

S2  X  PCA^ 

-1.40 

-1.32 

S2  X 

0.761 

0.80 

PCA^  X  pc.^2 

0.124 

0.058 

PCA^  X  p^M2 

0.908 

0.361 

PCA2  X  Paia^ 

SiX  Disparity 

0.229 

S2X  Disparity 

0.787 

0.398 

P(7^iX  Disparity 

0.31 

0.32 

PcA^x  Disparity 

0.38 

0.134 

p^i^2X  Disparity 

0.205 

0.149 

=  0.477 

=  0.466 
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3.9  Analysis  of  LP-IP  Gap 

The  size  of  the  LP-IP  gap  in  an  optimization  problem  is  sometimes  viewed  as  a 
factor  influencing  the  performance  of  solution  procedures  (Chang  and  Shepardson, 
1982).  Though  not  known  in  advance,  by  solving  the  test  problems,  one  can 
examine  how  the  experiment  design  factors  influence  the  size  of  the  LP-IP  gap. 

The  sign  test  is  used  to  test  the  null  hypothesis  of  no  difference  in  LP-IP  gap 
value  in  synthetic  test  problems  based  on  the  type  of  correlation  measure.  This 
is  done  by  separating  the  problems  by  the  correlation  measure  and  then  pairing 
the  problems  by  design  point  and  replication  number.  The  LP-IP  gap  was  larger 
for  Spearman  correlation-based  problems  in  875  of  the  1120  pairings  (16  pairings 
were  equal).  The  sign  test  results  indicate  that  problems  generated  based  on  the 
Spearman  correlation  structure  have  larger  LP-IP  gap  values.  Among  the  201  test 
problems  for  which  the  LP  solution  equaled  the  IP  solution,  only  51  problems  were 
generated  using  the  Spearman  correlation  measure. 

KW  tests  were  used  to  test  whether  there  were  LP-IP  gap  differences  due  to 
the  individual  correlation  values  within  each  correlation  structrue  and  whether  the 
constraint  slackness  settings  matter.  The  results  of  these  tests  are  presented  in 
Table  3.28.  The  constraint  slackness  setting  is  a  very  significant  factor  influencing 
the  size  of  the  LP-IP  gap  for  the  test  problems  generated  in  this  study.  Problems 
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Table  3.28:  KW  Test  results  for  LP-IP  Gap 


Pearson  Measure 

Spearman 

Measure 

Parameters 

Test 

Statistic 

p- value 

Test 

Statistic 

p- value 

PcA^ 

28.45 

<  0.0001 

17.18 

0.0017 

PcA^ 

18.03 

0.0012 

14.89 

0.0049 

Pa^a'^ 

15.22 

0.0043 

9.11 

0.0584 

Slackness 

60.67 

<  0.0001 

78.60 

<  0.0001 

with  tighter  constraints  tend  to  have  larger  LP-IP  gap  values.  All  the  correlation 
terms,  with  the  exception  of  for  Spearman  problems,  significantly  influenced 

the  LP-IP  gap.  Among  the  three  correlation  terms,  the  pcA^  term  seems  to  be  the 
most  significant. 

Although  the  LP-IP  gap  for  synthetic  optimization  problems  is  unknown  be¬ 
forehand,  the  insight  gained  in  this  section  could  be  useful  when  defining  problem 
generation  parameters  for  examining  solution  procedure  performance.  This  could 
be  particularly  useful  when  proposed  procedmes,  either  exact  or  heuristic,  rely  on 
LP  relaxations  during  the  solution  process. 


3.10  Discussion  and  Conclusions 

This  paper  examined  2KP  solution  procedures  using  synthetic  test  problems  gen¬ 
erated  based  on  a  variety  of  correlation  structures  and  constraint  slackness  settings 
using  both  Pearson  product- moment  and  Spearman  rank  correlation  generation 
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methods.  Hooker  (1994)  states  that  an  alternative  to  empirical  studies  with  unrep¬ 
resentative  optimization  problem  sets,  is  to  investigate  “how  algorithmic  perfor¬ 
mance  depends  on  problem  characteristics.”  This  study  shows  that  the  correlation 
structure  among  test  problem  coefficients  and  the  type  of  correlation  induced  in¬ 
fluence  solution  procedure  performance  on  2KP  instances. 

Not  only  does  the  correlation  structure  matter,  but  the  correlation  measure 
affects  solution  procedrue  performance.  Systematically  varying  the  problem  corre¬ 
lation  structure  yields  a  more  complete  range  of  problems  than  independent  sam¬ 
pling  does.  Interconstraint  correlation  is  shown  to  be  a  significant  factor  influencing 
performance  of  solution  methods.  For  a  specified  correlation  structiue,  a  Type  U 
composite  distribution  tends  to  produce  a  more  difficult  problem  than  a  Type  L 
distribution.  So,  the  level  of  independent  sampling  with  a  composition-based  sam¬ 
pling  method  affects  solution  procedure  performance.  Constraint  slackness  is  an 
established  problem  generation  parameter  but  this  study  highlights  the  interaction 
between  constraint  slackness  and  correlation  structure.  Finally,  for  some  design 
points,  CPLEX  performed  poorly  while  TOYODA  found  the  optimal  solutions. 
So,  one  must  always  be  cautious  about  generalizing  the  results  observed  with  one 
solution  method  to  other  methods.  Furthermore,  this  result  indicates  that  dif¬ 
ferent  test  problems  may  be  appropriate  for  evaluating  different  types  of  solution 
procedures. 

There  are  several  areas  of  further  investigation.  For  instance,  one  could  examine 
other  correlation  induction  methods,  more  constraint  slackness  settings,  larger  test 
problems  or  problems  involving  more  than  two  constraints.  Another  avenue  could 
examine  various  types  of  heuristics,  such  as  was  done  by  Zanakis  (1977),  or  various 
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optimization  methods,  to  compare  how  the  procedures  react  to  particular  test 
problem  parameter  settings.  In  terms  of  optimization  methods,  one  could  easily 
examine  how  problem  generation  parameters  settings  influence  the  effectiveness 
of  pre-processing  routines  such  as  valid  cut  generators,  bounding  procedures,  or 
problem  reduction  algorithms. 
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CHAPTER  IV 

CONCLUSIONS  AND  DISCUSSION 


Chapter  2  presents  a  new  composition  method  for  generating  values  of  multivari¬ 
ate  random  variables  with  explicit  correlation  induction.  Several  new  concepts 
are  introduced  during  this  development:  correlation  points,  extreme-correlation 
distributions,  Type  L  and  Type  U  composite  distributions. 

Using  composite  distributions  for  explicit  correlation  induction  has  several  ben¬ 
efits.  SampHng  is  easy  to  implement  since  the  constituent  components  of  the  com¬ 
posite  distribution,  the  extreme- correlation  distributions,  and  the  joint  distribu¬ 
tion  under  independence  are  easy  to  sample  from.  Many  feasible  correlation  points 
have  an  associated  composite  distribution,  which  combined  with  the  apphcabiHty 
of  composite  distributions  to  both  continuous  and  discrete  distributions,  yields  a 
method  with  wide  applicability.  Finally,  additional  modeling  flexibility  is  available 
since  for  nearly  all  correlation  points  expressable  as  a  composite  distribution,  there 
is  an  entire  range  of  joint  distributions  available. 
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Chapter  3  apphed  composite  distributions  for  trivariate  random  variables  in 
an  empirical  study  of  the  2KP.  This  study  examined  three  problem  generation 
factors;  the  type  of  correlation,  the  correlation  structure,  and  constraint  slackness. 
A  heuristic  and  a  branch-and-bound  procedure  were  examined  to  determine  how 
performance  is  influenced  by  the  problem  generation  factors. 

The  empirical  study  of  the  2KP  produced  some  exciting  findings.  The  type 
of  correlation,  Pearson  product-moment  or  Spearman  rank,  leads  to  differences  in 
solution  procedure  performance.  Each  correlation  term  in  the  correlation  struc¬ 
ture  and  in  particular  the  inter-constraint  correlation  term  are  significant  problem 
generation  parameters.  Previous  studies  have  not  isolated  the  effect  of  the  inter¬ 
constraint  correlation.  Mixed  levels  of  constraint  slackness  are  foimd  to  influence 
solution  procedme  performance.  Moreover,  this  empirical  study  highlighted  the 
synergistic  effect  between  correlation  structure  and  constraint  slackness  levels. 

Many  research  opportunities  could  follow  this  effort.  Some  are  mentioned  in 
Chapters  2  and  3,  but  there  are  others.  For  example,  the  influence  of  factors  such 
as  the  inter- constraint  correlation  on  CPLEX  performance  might  suggest  new  types 
of  algorithms  that  would  be  more  effective  on  instances  where  current  branch-and- 
bound  approaches  fare  poorly. 
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